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PREFACE 


RAND  WAS  COMMISSIONED  by  the  Office  of  the  Assistant  Secretary  of  De¬ 
fense  (Systems  Analysis)  to  prepare  a  boo.-,  on  the  subject  of  military 
equipment  cost-estimating  procedures.  This  memorandum  deals  with  funda¬ 
mentals  of  cost  analysis  and  constitutes  the  introductory  portion  of 
such  a  book.  In  addition  to  the  material  presented  here,  the  complete 
book  will  deal  with  uncertainty,  methods  and  techniques  for  e'stimatlng 
costs  of  military  equipment  such  as  aircraft,  and  cost  models.  Emphasis 
is  placed  on  cost-estimating  techniques  that  are  applicable  across  a 
broad  spectrun  of  major  military  equipment.  Consequently,  it  is  hoped 
that  this  memorandum,  which  represents  a  selection  of  the  more  general 
areas  covered  in  the  hook,  will  be  useful  throughout  the  Department  of 
Defense  and  the  aerospace  industry. 
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SUMMARY 


THIS  MEMORANDUM  is  a  compilation  of  topics  related  to  equipment  cost 
estimating.  These  topics  are  treated  in  five  separate  sections:  (1) 
cost-estimating  methods,  (2)  data  collection  and  adjustment,  (3)  sta¬ 
tistical  methods  in  development  of  estimating  relationships,  (4)  use 
of  cost-estimating  relationships,  and  (5)  the  learning  curve. 

There  are  three  basic  methods  used  for  cost  estimation — the  indus¬ 
trial  engineering,  analogy,  and  statistical  approaches.  The  industrial 
engineering  approach  respresents  an  examination  of  separate  segments  of 
work  at  a  low  level  of  detail  and  a  synthesis  of  the  many  detailed  es¬ 
timates  into  a  total.  The  method  of  analogy  is  based  on  direct  com¬ 
parisons  with  historical  information  on  like  components  of  existing 
systems.  In  the  statistical  approach,  as  defined  in  this  memorandum, 
estimating  relationships  with  parametric  explanatory  variables,  such 
as  weight,  speed,  power,  frequency,  and  thrust,  are  used  to  predict 
cost.  This  is  usually  applied  at  a  higher  level  of  detail  than  the 
industrial  engineering  approach. 

Of  the  three  approaches  to  cosc  estimating,  statistical  methods 
are  considered  to  be  the  most  useful  for  government  analysts  in  a  wide 
range  of  application,  whether  the  purpose  is  long-range  planning  or 
contract  negotiation.  Any  estimating  method,  however,  is  basically  a 
projection  from  past  experience,  and  to  make  this  projection  it  is  nec¬ 
essary  to  have  a  reliable  data  base.  This  must  Include  Information  on 
the  cost,  physical  and  performance  characteristics,  and  on  the  develop¬ 
ment  and  production  history  of  previous  hardware  programs.  In  addition, 
because  the  data  must  be  comparable  to  be  useful,  adjustments  must  be 
made  for  definitional  differences,  production  quantity  differences, 
and  yearly  price  changes. 
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In  the  discussion  on  statistical  methods,  a  hypothetical  example 
is  used  to  demonstj/ate  the  procedures  and  techniques  of  this  method. 
First,  attention  is  given  to  a  simple  linear  regression,  with  a  single 
explanatory  variable.  Next,  a  logarichmic  transformation  of  this  re¬ 
lationship  is  treated.  Finally,  multiple  regressions  are  performed  in 
various  pairwise  combinations  of  three  explanatory  variables.  These 
multiple  regressions  are  performed  for  both  linear  and  nonlinear  (log¬ 
arithmic)  relationships. 

The  limitations  of  estimating  relationships  stem  from  two  sources: 
first,  the  uncertainty  inherent  in  any  application  of  statistics;  and, 
second,  the  uncertainty  that  an  estimating  relationship,  is  applicable 
to  a  particular  situation.  Important  considerations  that  can  be  easily 
overlooked  during  a  purely  formal  statistical  analysis  include  (1)  the 
reasonableness  and  structural  soundness  of  the  estimating  relationship, 
(2)  the  importance  of  the  analyst’s  familiarity  with  the  actual  hard¬ 
ware,  and  (3)  systematic  bias  by  the  analyst.  Although  the  value  of 
statistical  estimating  relationships  should  not  be  discounted  (their 
wid'espread  i(se  and  general  applicability  attest  to  their  worth),  cau¬ 
tion  is  recommended  in  applying  these  relationships  outside  the  data 
base  from  which  they  were  derived. 

The  last  section  covers  the  subject  of  learning  curves,  which  are 
used  to  predict  reductions  in  cost  as  the  number  of  items  produced  in¬ 
creases.  The  learning  process  prevails  in  many  industries,  and  its 
existence  has  been  verified  by  empirical  data.  The  factors  that  account 
for  this  learning  trend  are  generally  attributed  to  such  items  as  job 
familiarization,  development  of  more  efficient  tools,  and  improvement 
in  overall  management.  The  basis  of  learning-curve  theory  is  that  each 
time  the  total  quantity  of  items  produced  doubles,  the  cost  per  item 
is  reduced  to  a  constant  percentage  of  its  previous  cost.  Such  a  rela¬ 
tionship  (log-linear)  may  be  expressed  in  terms  of  unit  cost  or  cumula¬ 
tive  average  cost.  In  practice,  the  unit  cost  is  most  frequently  con¬ 
sidered  to  be  linear,  but  there  are  sufficient  exceptions  to  suggest 
chat  the  choice  must  be  based  on  experience. 

When  learning  curves  are  displayed  graphically,  the  problem  arises 
of  how  to  plot  the  average  cost  for  a  lot  or  a  complete  contract,  since. 


SUMMARY 


vii 


typically,  man-hours  or  costs  are  not  recorded  by  each  unit.  For  the 
cumulative  average  curve,  the  plot  point  is  simply  the  endpoint  of 
each  lot,  since  this  is  the  point  where  the  cumulative  average  figure 
is  applicable.  For  the  unit  curve,  calculating  the  plot  point  is  more 
complex,  and  approximations  are  widely  used.  The  plotting  of  representa¬ 
tive  unit  costs  for  contract  lots  is  of  importance,  especially  the  early 
points  whose  misplacements  could  lead  to  Improper  conclusions  about  the 
cost-quantity  relationship. 

In  the-  application  of  learning  curves  to  problems  associated  with 
cost  estimating,  the  analyst  must  be  cognizant  of  the  wide  variations 
possible  and  the  reasons  for  such  variations.  A  thorough  knowledge  of 
the  learning-curve  phenomenon  is  indispensable  to  persons  involved  in 
cost  analysis. 
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I.  COST-ESTIMATING  METHODS 


A  COST  ESTIMATE  Is  a  judgment  or  opinion  regarding  the  cost  of  an  ob¬ 
ject,  commodity,  or  service.  This  judgment  or  opinion  may  be  arrived 
at  formally  or  Informally  by  a  variety  of  methods,  all  df  which  are 
based  on  the  assumption  that  experience  is  a  reliable  guide  to  the 
future.  In  some  cases  the  guidance  is  clear  and  unequivocal;  e.g. , 
bananas  cost  15<:  per  pound  last  week;  it  is  estimated  that  they  will 
cost  about  15c  per  pound  next  week,  barring  unforeseen  circumstances 
such  as  a  freeze  in  Guatemala.  At  a  more  sophisticated  level,  aver¬ 
age  costs  are  calculated  and  used  as  factors  to  estimate  the  cost  to 
excavate  a  cubic  yard  of  earth,  to  fly  an  airplane  for  an  hour,  or  to 
drive  an  automobile  a  mile.  Much,  perhaps  most,  estimating  is  of  this 
general  type,  i.e.,  where  the  relationship  between  past  experience  and 
future  application  is  fairly  direct  and  obvious. 

The  more  interesting  problems,  however,  are  those  in  which  the 
relationship  is  unclear,  because  the  proposed  item  differs  in  some 
significant  way  from  its  predecessors.  The  challenge  to  cost  analysts 
concerned  with  military  hardware  is  to  project  from  the  known  to  the 
unknown,  to  use  experience  on  existing  equipment  to  predict  the  cost 
of  next-generation  missiles,  aircraft,  and  space  vehicles.  The  chal¬ 
lenge  is  not  only  in  new  equipment  designs;  new  materials,  new  produc¬ 
tion  processes,  and  new  contracting  procedures  also  add  to  uncertainty 
These  innovations  are  sometimes  accompanied  by  expectations  of  cost  in 
creases  or  of  cost  reductions  that  must  be  carefully  evaluated. 

The  techniques  used  for  estimating  hardware  cost  range  from  Intui 
tlun  at  one  extreme  to  a  detailed  application  of  labor  and  material 
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cost  standards  at  the  other.  One  of  the  military  services’  manuals  on 
cost  estimating  lists  five  basic  methods — Industrial  engineering  stan¬ 
dards;  rates,  factors,  and  catalog  prices;  estimating  relationships; 
specific  analogies;  and  expert  opinion.  Other  sources  put  the  number 
at  two  (synthesis  and  analysis),  three  (round-table  estimating,  esti¬ 
mating  by  comparison,  and  detailed  estimating) ,  or  four  (analytical 
appraisal,  comparative  analysis,  statistical  analysis,  and  use  of  stan¬ 
dards).  In  this  section,  the  discussion  will  be  limited  to  three  tech¬ 
niques — the  Industrial  engineering  approach,  analogy,  and  the  statisti¬ 
cal  approach — and  it  is  the  latter  that  will  be  of  primary  concern 
throughout  the  remainder  of  the  memorandum. 

Estimating  by  industrial  engineering  procedures  can  be  broadly 
defined  as  an  examination  of  separate  segments  of  work  at  a  low  level 
of  detail  and  a  synthesis  of  the  many  detailed  estimates  into  a  total. 
Statistical  estimating  is  sometimes  defined  as  a  statistical  extrapo¬ 
lation  to  produce  an  estimate-at-completion  after  progress  has  been 
made  on  a  job  and  costs  or  commitments  have  been  experienced,  but  this 
is  not  the  sense  in  which  the  term  is  used  in  this  study.  In  the  sta¬ 
tistical  approach,  estimating  relationships  that  use  explanatory  vari¬ 
ables  such  as  weight,  speed,  power,  frequency,  and  thrust  are  relied 
on  to  predict  cost  at  a  higher  level  of  aggregation.  Figure  1  illus¬ 
trates  this  difference  in  level  of  detail.  At  the  lowest  level  of  de¬ 
tail,  the  estimator  begins  with  a  set  of  drawings  and  specifies  each 
engineering  task,  tool  requirement,  or  production  operation,  including 
the  labor  and  material  required.  This  is  sometimes  referred  to  as 
"grass-roots"  estimating. 

Table  1  illustrates  the  detail  required  at  the  lowest  level  of 
estimating;  in  this  case  a  labor  cost  estimate  for  forming  a  steel 
center  bracket.  The  name  and  number  of  the  operations  and  the  machines 
that  will  be  used  are  given  with  estimates  of  setup  and  operating  time 
and  labor  cost.  When  they  exist,  standerd  setup  and  operating  coots 
are  used  in  making  estimates,  but  if  standards  have  not  been  estab¬ 
lished  (which  is  frequently  the  case  in  the  aerospace  industry) ,  a 
detailed  study  is  made  to  determine  the  most  efficient  method  of  per¬ 
forming  each  operation.  A  standard  may  be  a  "pure"  standard  or  an 
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STATISTICAL 

PROCEDURE 


Engineering  direct 
labor  hours 


Engineering 

materials 


Engineering 
direct  charges 


Tooling  direct 
labor  hours 


Tooling  materials 
and  purchased  tools 


Tooling 
direct  charges 


INDUSTRIAL  ENGINEERING 
PROCEDURE 


Number  of  engineers 
by  department  and  task 


Type  and  quantity  of 
materials  and  test  equipment 


Type  of  direct  charge: 

computer  rental, 
reproduction  services, 
travel  and  per  diem 


Type  and  quantity  of 
specific  tools  required 


Type  of  direct  charge: 
equipment  rental, 
blueprint  services 


Quality  control 
direct  labor  hours 


Quality  control 
direct  charges 


Manufacturing 
direct  labor  hours 


Manufacturing 
materials  and 
purchased  parts 


Manufacturing 


Work  center  and  station 
requirements  or 
percentage  of  direct 
labor  hours 


Tasks  by  manufacturing 
processes :  fabrication 
suba^s  emb ly ,  f inal 
assembly,  and  checkout 


Parts  list,  specific  type 
and  quantity  of  raw 
materials,  scrap  and  rejects 


Type  of  direct  charge: 
reproduction  services, 
travel  and  per  diem 


Parts  list  items:  landing 
gear,  environmental  control j 
secondary  power,  instruments 


Fig.  1 — Levels  of  aggregation  for  estimating  purposes 


DETAILED  LABOR  COST  ESTIMATE  FOR  FORMING  A  STEEL  CENTER  BRACKET 
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"attainable"  standard,  but  for  a  specified  condition,  it  is  essentially 
the  minimum  time  required  to  complete  a  given  operation  and  theoreti¬ 
cally  should  be  approached  asymptotically  when  the  planned  production 
rate  is  attained. 

Standards  are  not  widely  used  in  the  aerospace  industry  for  esti¬ 
mating  costs,  although  they  are  used  extensively  for- other  purposes, 
such  as  control  of  shop  performance.  Standards  are  best  applied  when 
a  long,  stable  production  run  of  identical  items  is  envisaged;  in  the 
aerospace  industry,  however,  emphasis  is  often  placedi  on  development 
rather  than  on  production.  The  Gemini  program  provides  an  extreme 
example:  Twelve  spacecraft  of  varying  configurations;  were  developed 
and  produced  at  a  cost  of  $700  million.  Other  examples  would  be  less 
dramatic,  but  it  is  true  that  compared  with  industry  in  general,  pro¬ 
duction  ruivj  of  advanced  military  and  space  hardware  tend  to  be  short, 
and  both  design  configurations  and  production  processes  may  continue 
to  evolve  even  after  several  hundred  units  have  been  completed..  This 
means  that  standards  are  continually  changing — one  standard  applies 
at  unit  50,  another  at  other  production  quantities.  Because  changes 
are  unpredictable,  it  is  difficult  to  establish  standards  that  will 
be  applicable  at  some  specified  production  quantity  in  advance  of 
production  experience. 

Industrial  engineering  estimating  procedures  require  consider¬ 
ably  more  personnel  and  data  than  are  likely  to  be  available  to  gov¬ 
ernment  agencies  under  any  foreseeable  conditions.  One  of  the  largest 
aerospace  firms  judges  that  the  use  of  this  approach  in  estimating  the 
cost  of  an  airframe  requires  about  4500  estimates;  for  this  reason, 
the  firm  avoids  making  industrial  engineering  estimates  whenever  pos¬ 
sible.  They  take  too  much  time  and  are  costly  to  both  contractor  and 
government  during  a  period  of  limited  funds.  Moreover,  for  many  pur¬ 
poses  they  have  been  found  to  be  less  accurate  than  estimates  made 
statistically.  One  reason  is  simply  that  the  whole  often  turns  out 
to  be  greater  than  the  sum  of  .4500  parts.  The  detail  estimator  works 
under  the  same  disadvantages  as  do  all  other  estimators  before  an  item 
has  been  produced.  He  works  from  sketches,  blueprints,  or  word  de¬ 
scriptions  of  some  item  that  has  not  been  completely  designed,  and  he 
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can  assign  costs  only  to  work  that  he  knows  about.  (An  attempt  is  some¬ 
times  made  to  estimate  the  completeness  of  the  work  statement  and  this 
estimate  becomes  a  factor  to  apply  to  the  detail  estimate;  e.g.,  if  the 
work  statement  js  judged  to  be  50  percent  complete,  the  detail  estimate 
is  multiplied  by  two.)  The  effect  of  a  low  estimate  is  compounded  be¬ 
cause  detail  estimating  is  normally  attempted  only  on  a  portion  of 
production  labor  hours.  A  number  of  production  labor  elements,  such 
as  rework,  planning  time,  and  coordination  effort,  are  usually  factored 
in  as  percentages  of  the  detail  estimate.  Then,  other  cost  elements, 
such  as  sustaining  effort,  tool  maintenance,  quality  control,  and  manu¬ 
facturing  research,  are  factored  in  as  percentages  of  production  labor. 
Thus,  small  errors  in  the  detail  estimate  can  result  in  large  errors 
in  the  total. 

A  second  reason  for  considering  industrial  engineering  standards 
less  accurate  than  estimates  made  statistically  has  already  been  sug¬ 
gested.  Significant  variability  in  the  fabrication  and  assembly  of 
successive  production  units  is,  and  will  continue  to  be,  characteris¬ 
tic  of  the  industry.  Production  runs  of  like  models  tend  to  be  of  lim¬ 
ited  length  and  are  characterized  by  numerous  design  changes.  In  the 
case  of  military  aircraft,  production  rates  have  tended  to  vary  fre¬ 
quently  and  at  times  unexpectedly.  The  proportion  of  new  components 
in  equipment  is  probably  higher  in  the  aerospace  industry  than  in  any 
other.  The  effect  of  these  factors  can  be  represented  statistically 
by  the  learning  or  progress  curve  so  characteristic  of  this  industiry. 

One  set  of  fabrication  and  assembly  modes  is  succeeded  by  uore  effi¬ 
cient  production  functions,  which  lower  the  total  labor  requirement. 

The  introduction  of  engineering  changes  causes  discontinuities  in  this 
process  but  does  not  interfere  with  the  general  trend.  If  new  manu¬ 
facturing  processes  and  techniques  are  introduced,  these  may  cause 
changes  in  past  relationships.  History,  however,  seems  to  show  thar 
changes  in  manufacturing  and  management  techniques,  although  they  may 
have  dramatic  impacts  in  circumscribed  areas,  tend  to  result  in  only 
gradual  changes  over  the  entire  process. 

Because  a  private  concern  generally  has  information  only  on  its 
own  products,  much  of  the  estimating  in  industry  is  based  on  analogy, 
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particularly  when  a  firm  is  venturing  into  a  new  area.  For  example, 
in  the  1950s,  aircraft  companies  bidding  on  ballistic  missile  programs 
drew  analogies  between  aircraft  and  misKiles  to  develop  estimates  for 
the  latter.  Douglas  Aircraft  Company  (now  McDonnell-Douglas)  made  a 
good  estimate  on  the  Thor  Intermediate  range  ballistic  missile  by  com¬ 
paring  Thor  with  the  DC-4  transport  airplane.  This  company  later  based 
its  Estimates  of  the  Saturn  S-IV  stage  on  its  Thor  experience;  Even 
with  appropriate  adjustments  for  differences  in  size,  the  number  of 
engines,  higher  performance,  and  insulation  problems  (the  need  to  cope 
with  liquid  hydrogen  as  well  as  liquid  oxygen),  this  attempt  was  not 
as  successful  as  the  first. 

At  all  levels  of  aggregation,  much  estimating  is  performed  by 
this  type  of  analogy:  System  A  required  100,000  hours j  given  the 
likenesses  and  differences  in  design  and  in  performance  of  proposed 
System  B,  the  requirement  for  S  is  estimated  at,  say,  120,000  hours. 

Or,  at  a  different  level,  engineers  and  shop  foremen  may  rely  on  anal¬ 
ogies  when  making  a  grass-roots  estimate;  in  this  event,  analogy  be¬ 
comes  part  of  the  industrial  engineering  approach.  The  major  drawback 
to  estimating  by.  analogy  is  that  it  is  essentially  a  judgment  process 
and,  as  a  consequence,  requires  considerable  experience  and  expertise 
to  be  done  successfully.  For  the  government  cost  analyst,  analogy  can 
be  useful  for  a  rough  check  of  an  estimate;  however,  when  making  esti¬ 
mates,  analogy  based  on  a  sample  of  1  adjusted  by  some  complexity  fac¬ 
tor  should  be  avoided.  This  caveat  rests  on  the  contention  that  first, 
it  is  poor  statistics;  second,  it  is  nonreproduclble;  and  third,  it 
cannot  be  evaluated  by  the  user  of  the  estimate. 

Although  statistical  procedures  are  preferable  in  most  situations, 
there  are  circumstances  when  analogy  or  Industrial  engineering  tech¬ 
niques  are  required  because  the  data  do  not  provide  a  systematic  his¬ 
torical  baeis  for  estimating  cost  behavior.  It  may  be  that  a  new  item 
is  to  be  constructed  of  some  unfamiliar  material,  or  that  a  design 
consideration  is  so  radically  different  that  statistical  procedures 
are  inadequate.  The  use  of  new  structural  material  for  aircraft  often 
requires  the  development  of  special  cutting  and  forming  techniques 
with  manufacturing  labor  requirements  that  differ  significantly  from 
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those  based  on  a  sample  of  primarily  aluminum  airframes.  Faced  with 
this  problem  when  titanium  was  first  considered  for  use  in  airframe 
manufacture,  airframe  companies  developed  standard-hour  values  for  ti¬ 
tanium  fabrication  on  the  basis  of  shop  experience  in  fabricating  test 
parts  and  sections.  Ratios  of  these  values  to  those  for  comparable 
operations  on, aluminum  aircraft  were  prepared,  and  these  ratios  were 
then  used  in  existing  statistical  estimating  relationships.  Thus, 
while  Industrial  engineering  procedures  were  used  to  provide  input  data, 
the  approach  remained  statistical. 

A  similar  situation  occurs  in  the  case  of  Industrial  facilities. 
Requirements  for  these  cannot  be  estimated  without  knowing  the  contrac¬ 
tor's  identity  and  the  extent  and  availability  of  his  existing  plant. 
Consequently,  the  cost  of  facilities  must  be  estimated  from  information 
available  for  each  specific  case. 

There  will  always  be  situations  in  which  analogy  or  industrial  en¬ 
gineering  techniques  are  required,  but  in  general  the  statistical  ap¬ 
proach  is  useful  in  a  wide  range  of  contexts,  whether  the  purpose  is 
long-range  planning  or  contract  negotiation.  In  the  former,  a  more 
highly  aggregated  procedure  may  be  used  because  it  ensures  comparabil¬ 
ity  when  little  detailed  knowledge  about  the  equipment  is  available. 
Total  hardware  cost  may  be  estimated  as  a  function  of  one  or  more  ex¬ 
planatory  variables;  e.g.,  engine  cost  as  a  function  of  thrust,  or 
transmitter  cost  as  a  function  of  power  output  and  frequency.  However, 
this  approach  is  often  a  matter  of  necessity,  not  choice.  Even  for 
long-range  planning,  it  is  sometimes  desirable  to  estimate  in  some 
detail . 

To  say  that  statistical  techniques  can  be  used  in  a  variety  of 
situations  does  not  imply  that  the  techniques  are  the  same  for  all  sit¬ 
uations.  They  will  vary  according  to  the  purpose  of  the  study  and  the 
information  available.  In  a  conceptual  study,  it  is  necessary  to  have 
a  procedure  for  estimating  the  total  expected  costs  of  a  program,  and 
this  must  include  an  allowance  for  the  contingencies  and  unforeseen 
changes  that  seem  to  be  an  inherent  part  of  most  development  and  pro¬ 
duction  programs. 

Similarly,  a  long-range  planning  study  will  use  industry-wide 
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labor  aitd  burden  rates  and  an  estimated  learning-curve  slope;  later  in 
the  acquisition  cycle,  data  that  are  specific  for  a  particular  contrac¬ 
tor  in  a  particular  location  can  be  used.  In  effect,  this  procedure 
merely  asserts  the  obvious:  As  more  is  known,  fewer  assumptions  are 
required.  When  enough  is  known,  and  this  means  wiien  a  product  is  well 
into  production,  accounting  information  and  data  can  be  taken  directly 
from  records  of  account  and  used  with  a  minimum  of  statistical  manip¬ 
ulation.  This  technique  is  useful  only  in  those  cases  when  the  future 
product  or  activity  under  consideration  is  essentially  the  same  (both 
in  terms  of  configuration  and  scale  of  production  or  operation)  as 
that  for  the  past  or  current  period. 

In  any  situation  the  estimating  procedure  to  be  used  should  be 
determined  by  the  data  available,  the  purpose  of  the  estimate,  and, 
to  an  extent,  by  such  other  factors  as  the  time  available  to  make  an 
estimate.  Ti.e  essential  idea  to  be  conveyed  in  this  section  is  that, 
when  properly  applied,  statistical  procedures  are  varied  and  flexible 
enough  to  be  useful  in  most  situations  that  aerospace  equipment  cost 
analysts  are  likely  to  encounter.  Although  no  specified  set  of  pro¬ 
cedures  can  guarantee  accuracy,  decisions  must  be  made;  it  is  essen¬ 
tial  that  they  be  based  on  the  best  possible  Information.  The  analyst 
..must  seek  the  approaches  that  will  provide  the  best  possible  answers, 
given  the  basic  Information  that  is  available. 

Although  the  content  of  this  memorandum  is  limited  to  methods  of 
estimating  equipment  cost,  any  decision  to  undertake  a  new  program 
typically  takes  into  consideration  far  more  than  the  outlays  needed 
to  develop  and  produce  the  equipment.  For  example,  there  may  be  a 
need  for  complementary  hardware,  such  as  launchers  or  test  equipment; 
possibly  additional  construction  will  be  needed,  such  as  lengthened 
runways  or  hardened  shelters.  Other  investment  items  may  include  the 
cost  of  personnel  training,  computer  programming  services,  and  develop¬ 
ment  of  technical  data.  However,  a  number  of  items  that  contribute  to 
system  operating  cost  (particularly  spares)  are  usually  estimated  as 
a  function  of  total  equipment  cost. 

In  addition  to  the  initial  investment  that  is  needed  to  estab¬ 
lish  a  new  capability,  there  are  costs  of  operating  and  maintaining 
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equipment'  that  continue  as  long  as  it  is  in  the  active  Inventory.  These 
recurring  costs  include 

•  Replacement  of  common  (or  organizational)  equipment. 

•  Replenishment  of  spare  parts  and  supplies. 

•  Fuels,  lubricants,  and  propellants. 

•  Training  ordnance  and  other  expendables. 

.  •  Personnel  costs . 

•  Facilities  maintenance. 

•  Training  of  replacements. 

•  Maintenance  and  other  logistics  support  by  separate 
organizations . 

These  operating  costs  are  far  more  Important  in  the  lifetime  total 
coat  computation  than  their  annual  figure  might  suggest.  In  fact, 
since  the  life  of  a  modern  weapon  system  may  run  ten  years  (or  longer) , 
the  investment  needed  to  establish  a  new  system  may  be  dwarfed  by  the 
costs  requited  to  operate  and  to  maintain  it.  The  practical  conse¬ 
quence  of  this  observation  is  that  when  the  overall  study  is  con¬ 
strained  by  time  and  personnel  limitaticns,  as  is  often  the  case,  the 
estimation  of  equipment  costs  can  be  accorded  only  a  reasonable  share 
of  the  time  and  personnel  available  for  the  whole  study. 


II.  DATA  COLLECTION  AND  ADJUSTMENT 


THE  GOVERNMENT  has  been  collecting  cost  and  program  data  on  weapon  and 
support  systems  for  many  years — sometimes  in  detail,  sometimes  in  highly 
aggregated  form.  Consequently,  it  is  surprising  that  the  right  data 
seldom  seem  to  be  available  when  an  estimating  job  is  required,.  It  ap¬ 
pears  that  the  needs  of  the  cost  analyst  have  not  always  been  consid¬ 
ered  in  designing  the  many  Information  systems  that  have  been  used  by 
the  Army,  Navy,  and  Air  Force.  Data  have  been  collected  for  program 
control,  for  program  management,  and  for  program  audit,  but  this  infor¬ 
mation  has  never  been  systematically  processed  and  stored.  Instead, 
after  a  few  years  it  has  generally  been  discarded  or  placed  in  not  read¬ 
ily  accessible  warehouses.  Moreover,  the  data  were  often  inconsistent 
since  they  were  gathered  according  to  the  requirements  of  each  military 
service  and  each  program  manager.  To  obtain  the  data  to  develop  esti¬ 
mating  relationships,  the  analyst  has  had  to  use  contractor  records. 


Data  Collection 


The  Cost  Information  Report  (CIR)  was  established  in  1966  to  alle¬ 
viate  the  problem  of  data  collection.  This  reporting  system  was  de¬ 
signed  to  collect  costs  and  related  data  on  major  contracts  for  air¬ 
craft  ’ind  missile  and  space  programs  to  assist  Industry  and  government 
in  estimating  and  analy^.ing  the  costs  of  these  programs.  Informaf-lon 
from  other  sources  (contract  records,  management  records,  and  the  like) 
can  be  processed  to  complement  the  CIR  and  thus  makd  complete  program 
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histories  available (Subsequent  sections  of  this  study  describe  the 
methods  of  analysis  that  this  Information  was  designed  to  serve.)  As. 
data  accumulate  over  a  period  of  years,  the  need  for  ad  hoc  collection 
efforts  should  diminish.  These  efforts  will  never  disappear  completely, 
however,  as  Information  systems  cannot  be  designed  to  satisfy  every 
data  requirement.  Under  Ideal  conditions,  the  analyst  would  have  data 
with  which  to  develop  estimating  techniques  responsive  to  any  demand, 
but  even  the  largest  contractors  are  reluctant  to  allocate  the  resources 
required  to  put  estimators  in  such  a  favorable  position,  and  the  cost 
to  the  Department  of  Defense  (DOD)  for  such  data — much  of  which  would 
seldom  be  used — would  be  prohibitive.  However,  a  government  analyst  or 
estimator  has  one  great  advantage  over  his  counterpart  In  industry: 

He  has  a  much  broader  data  base  to  draw  on. 

A  minimum  data  requirement  exists  for  any  given  job,  but  before 
data  collection  begins  the  analyst  must  consider  the  scope  of  his  pro¬ 
blem,  define  generally  what  he  wants  to  dO;  and  decide  how  to  do  It. 

The  data  required  to  estimate  equipment  costs  for  a  long-range  plan¬ 
ning  study  can  be  substantially  less  than  those  needed  to  prepare  an 
Independent  cost  estimate  for  contract  negotiation.  In  the  former, 
total  equipment  costs  may  suffice;  in  the  latter,  costs  must  be  col¬ 
lected  at  the  level  of  detail  in  which  the  contract  is  to  be  negotia¬ 
ted.  For  major  Items,  this  means  a  functional  breakout,  e.g.,  direct 
labor,  materials,  engineering,  and  tooling.  One  could  postulate  pro¬ 
blems  requiring  even  a  greater  amount  of  detail.  Suppose,  for  example, 
that  two  similar  hardware  Items  had  substantially  different  costs. 

Only  by  examining  the  cost  detail  could  this  difference  be  explained. 

In  performing  this  initial  appraisal  of  the  job,  the  analyst  will 
be  aided  by  a  thorough  knowledge  of  the  kind  of  equipment  with  which 
he  will  be  dealing — its  characteristics,  the  state  of  its  technology, 
and  the  available  sample.  With  this  knowledge  he  can  determine  the 
kinds  of  data  that  are  required  and  that  are  available  for  what  he 
wants  to  do,  where  the  data  are  located,  and  the  kinds  of  adjustments 
that  may  be  required  to  make  the  collected  data  base  consistent  and 
comparable.  Only  aftver  the  problem  has  been  given  this  general  con¬ 
sideration  should  the  task  of  data  collection  begin.  All  too  often 
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large  amounts  of  data  are  collected  with  little  thought  about  use. 

The  result  is  that  some  portion  may  be  unnecessary,  unusable,  or  not 
completely  understood.  Data  collection  is  generally  the  most  trouble¬ 
some  and  time-consuming  part  of  cost  analysis.  Consequently,  careful 
planning  in  this  phase  of  the  overall  effort  is  well  worthwhile. 

Historical  Data 

To  develop  a  cost-estimating  procedure,  at  least  three  different 
types  of  historical  data  are  required.  First,  there  are  the  resource 
data,  usually  in  the  form  of  expenditures  and  labor  hours.  It  is  cus¬ 
tomary  to  apply  the  word  cost  to  both,  and  that  practice  is  followed 
throughout  this  text.  A  second  type  of  data  describes  the  possible 
cost-explanatory  elements;  for  hardware  such  as  aircraft  and  missiles 
this  means  performance  and  physical  characteristics.  The  third  type 
is  program  data,  l.e.,  information  related  to  the  development  and  pro¬ 
duction  history  of  past  hardware  programs. 

Resource  Data 

Resource  data  are  generally  classified  under  end-item  categories 
or  functional  categories.  An  example  of  the  former  in  various  possible 
levels  of  detail  are  system,  subsystem,  component,  and  part.  The  func¬ 
tional  cost  categories,  such  as  engineering,  tooling,  manufacturing, 
quality  control,  purchased  equipment,  are  usually  broken  down  into  cost 
.elements — labor,  material,  overhead,  and  other  direct  charges.  The 
data  source  is  the  contractor’s  plant.  Generally,  the  accounting  sys¬ 
tems  will  vary  from  one  company  to  another,  and  the  amount  of  detail  is 
Immense.  A  typical  airframe  company,  for  example,  sets  up  the  produc¬ 
tion  process  on  the  basis  of  a  number  of  different  jobs  or  stations, 
each  identified  by  a  number  or  symbol.  All  manufacturing  direct  labor 
and  material  (depending  on  the  type  of  cost-accounting  system)  expended 
on  a  given  job  is  recorded  on  a  job  order  or,  as  is  becoming  increas¬ 
ingly  more  common,  fed  directly  into  a  computer.  When  such  a  system 
is  used,  the  actual  hours  Incurred  for  every  operation  are  available 
to  management;  and  these  costs  can  be  aggregated  as  they  are  needed. 
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Manufacturing  costs  of  this  type  can  be  attributed  to  a  lot  or  often  to 
a  single  unit.  (Some  categories  of  cost  are  not  identifiable  by  lot  or 
unit,  e.g.,  tooling  and  engineering.)  But  since  contractors  organize 
their  work  differently,  different  job  orders  will  be  used.  This  means 
that  data  at  more  detailed  levels  may  vary  from  contractor  to  contractor 
and  may  not  be  comparable.  Also,  detailed  information  of  this  kind  is 
unnecessary  for  most  government  analysis  and  should  rarely  be  sought. 

If  there  were  a  need  to  estimate  in  more  detail,  the  data  required 
would  Increase  by  at  least  an  order  of  magnitude,  and  data  processing 
equipment  would  become  a  necessity.  When  to  incorporate  automatic  data 
processing  techniques  into  the  data  collection  effort  is  determined 
primarily  by  the  volume  of  data  to  be  handled.  The  trend  in  the  aero¬ 
space  indust'ry  is  to  rely  more  and  more  on  computers  for  internal  data 
needs,  and  for  seme  purposes  data  have  been  provided  to  the  government 
on  punched  cards  or  magnetic  tape.  Thus,  there  are  no  technical  rea¬ 
sons  why  cost  data  could  not  be  obtained  in  this  form  should  it  be 
more  convenient  to  the  cost  analyst  but,  as  mentioned  earlier,  there 
are  good  reasons  not  to  use  excessive  detail  even  if  it  is  readily 
available;  Expense  Increases  and  accuracy  is  unlikely  to  improve. 

Theoretical  considerations  aside,  estimating  techniques  must  be 
based  on  whatever  resource  data  the  analyst  can  find,  and  in  the  past 
the  availability  of  data  has  varied  from  oi;e  kind  of  equipment  to  an¬ 
other.  To  illustrate,  aircraft  airframe  estimating  procedures  tend  to 
be  different  from  those  developed  for  other  types  of  equipment.  An 
airframe  model  may  contain  all  of  the  following  categories: 

•  Initial  and  sustaining  engineering. 

»  Flight  test  operations. 

•  Initial  and  sustaining  tooling. 

•  Manufacturing  labor. 

•  Manufacturing  materia?.. 

•  Quality  control. 

Such  a  list  of  cost  categories  is  desirable  for  all  hardware  estima¬ 
ting,  but  because  of  data  limitations,  present  procedures  for  engines 
often  cover  only  two  phases  of  the  procurement  cost,  development  and 
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production,  and  avionics  procedures  only  one,  procurement  cost  to  the 
government.  The  CIR  should  expand  these  possibilities  in  the  future. 

Physical  and  Performance  Characteristics 

Information  about  the  physical  and  performance  characteristics  of 
aircraft  and  missile  and  space  systems  is  just  as  important  as  resource 
data.  Data  collection  in  this  area  can  be  time-consuming,  particularly 
since  it  is  not  often  clear  in  advance  what  data  will  be  required.  The 
goal,  of  course,  is  to  obtain  a  list  of  those  characteristics  that  best 
explain  differences  in  cost.  Weight  is  a  commonly  used  explanatory 
variable,  but  weight  alone  is  seldom  enough;  speed  is  almost  always  in¬ 
cluded  as  a  second  explanatory  variable  for  aircraft  airframes.  One 

•k 

estimating  procedure  for  aircraft  uses  all  of  the  following: 

•  Maximum  speed  at  optimal  altitude, 
o  Maximum  speed  at  sea  level. 

•  Year  of  first  delivery. 

•  Total  airframe  weight. 

•  Increase  in  airframe  weight  from  unit  i  to  unit  n. 

•  Weight  of  installed  equipment. 

•  Engine  weight. 

•  Electronics  complexity  factor. 

In  addition,  the  following  characteristics  were  considered  for  inclu¬ 
sion  as  part  of  the  estimating  procedure,  although  they  were  not  used: 

•  Maximum  rate  of  climb. 

•  Maximum  wing  loading. 

•  Empty  weight. 

'i  Maximum  altitude. 

•  Design  load  factor. 

•  Maximum  range. 

•  Maximum  payload. 


Planning  Research  Corporation,  Methods  of  Estimating  Fixed-wing 
Airframe  Costs,  Vol.  I  (Revised),  PRC  R547A,  Los  AngeleSt  April  1967. 
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At  the  outset  of  a‘  study  undertaken  to  develop  an  estimating  re¬ 
lationship  for  aircraft  cost,  the  cost  analyst  would  not  know  which  of 
all  these  characteristics  would  provide  the  best  explanation  of  vari¬ 
ations  among  the  cost  of  different  aircraft;  he  would  of  necessity  try 
to  be  as  comprehensive  as  possible.  An  analyst  who  is  familiar  with 
the  type  of  hardware  under  study  will  have  some  idea  of  the  most  likely 
candidates,  but  he  will  generally  consider  more  characteristics  than 
will  eventually  be  used. 

Program  Data 

A  third  type  of  essential  data  is  drawn  from  the  development  and 
production  history  of  hardware  items.  The  acceptance  date  of  the  item, 
the  significant  milestones  in  the  development  program,  the  production 
rates,  and  the  occurrence  of  major  and  minor  modifications  in  produc¬ 
tion — all  such  information  can  contribute  to  the  development  of  cost- 
estimating  relationships.  The  list  of  explanatory  variables  discussed 
in  the  previous  section  includes  year  of  first  delivery  and  increase 
in  airframe  weight  from  unit  1  to  unit  n,  information  that  would  be 
Included  in  the  category  program  data. 

An  airframe  typically  changes  in  weight  during  both  development 
and  production  as  a  result  of  engineering  changes.  For  example,  the 
weight  of  the  F-4D  varied  as  follows; 

Cumulative  Airframe 

Plane  Number  Unit  Weight  (lb) 

1-  11  .  8456 

12-186  ......  8941 

187-241  .  8541 

242-419  .  9193 

Since  labor  hours  are  commonly  associated  with  weight  to  obtain  hours- 
per-pound  factors,  it  is  Important  to  obtain  weights'* applicable  to  each 
production  lot  if  airframe  weights  by  unit  are  not  available. 

The  need  for  other  kinds  of  program  data  will  be  clarified  under 
the  discussion  on  data  adjustment.  To  cite  one  example  here,  the  year 
in  which  expenditures  occur  must  be  known  to  adjust  cost  data  for  price 
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level  changes.  (This  is  the  reason  for  at  least  one  CIR  submission  an¬ 
nually.)  A  certain  amount  of  program  data  cannot  be  specified  with  this 
degree  of  precision  nor  can  the  use  of  these  data  be  foretold,  but  the 
Information  is  important  nonetheless.  It  is  what  might  be  called  back¬ 
ground  information — data  on  other  activities  in  the  contractor's  plant 
at  the  time  a  particular  hardware  item  is  being  built;  unusual  problems 
the  contractor  may  be  encountering;  attempts  tc  compress  or  stretch  out 
the  program;  and  inefficiencies  that  are  noted.  This  information  may 
be  useful  in  explaining  those  factors  that  appear  to  be  aberrations  when 
the  resource  data  are  compared  with  those  from  other  development  and 
production  programs.  In  addition,  a  history  of  a  contractor's  overhead, 
general  and  administrative  costs,  and  labor  rates  is  useful  for  analyz¬ 
ing  and  predicting  costs. 


Data  Adjustment 

To  be  useful  to  the  cost  analyst,  data  must  be  consistent  and  com¬ 
parable,  and  in  most  cases  the  data  as  collected  are  neither.  Hence, 
before  estimating  procedures  can  be  derived,  an  adjustment  must  be  made 
for  definitional  differences,  production  quantity  differences,  yearly 
price  changes,  and  so  on.  The  more  common  adjustments  are  examined  in 
this  section.  It  is  by  no  means  an  exhaustive  treatment  of  the  subject: 
The  list  of  possible  adjustments  is  long  and  many  of  them  will  apply 
only  in  a  very  small  number  of  cases.  Also,  evidence  on  certain  types 
of  adjustments  (for  contractor  efficiency,  for  contract  type,  for  pro¬ 
gram  stretch-out)  consists  largely  of  opinion  rather  than  hard  data. 
While  the  cost  analyst  may  allude  to  such  adjustments,  the  research 
necessary  to  treat  them  in  some  definitive  way  has  not  yet  been  done. 

Definitional  Differences 

Different  contractor  accounting  practices  and  make  or  buy  arrange¬ 
ments  are  primary  reasons  why  adjustment  of  the  basic  cost  data  is  gen¬ 
erally  necessary.  Companies  record  their  costs  in  different  ways.  Of¬ 
ten  they  are  required  to  report  costs  to  the  government  by  categories 
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that  differ  from  those  used  Internally.  Also,  government  reporting  cat¬ 
egories  change  from  time  to  time.  Because  of  these  definitional  dif¬ 
ferences,  one  of  the  first  steps  in  cost  analysis  is  to  state  the  def¬ 
inition  that,  is  being  used  and  to  adjust  all  data  to  this  definition. 
With  the  inception  of  the  CIR,  a  standard  set  of  definitions  for  air¬ 
frames  has  been  established  for  use  throughout  the  DOD.  A  primary  pur¬ 
pose  of  the  CIR  is  to  overcome  the  problem  of  definitional  differences 
in  liardware  cost  data.  For  the  next  few  years,  however,  most  data  will 
antedate  the  CIR  and  some  adjustment  will  be  required. 

As  an  example  of  what  may  be  expected,  a  cost  analyst  may  be  ex¬ 
amining  data  from  a  sample  of  ten  hardware  items  and  discover  that  the 
cost  category  Quality  Control  is  missing  for  some  of  the  earlier  items. 
He  may  conclude  that  no  quality  control  was  exercised  in  the  i950s  or 
that  this  function  is  Included  in  another  cost  element.  The  latter 
assumption  is  correct.  Traditionally,  Quality  Control  was  carried  in 
the  burden  account,  and  it  was  only  in  the  late  1950s  that  it  began  to 
appear  (at  the  request  of  the  DOD)  as  a  separate  element.  Hence,  to 
use  cost  data  on  equipment  built  prior  to  this  change  requires  convert¬ 
ing  a  portion  of  overhead  cost  to  Quality  Control. 

A  more  current  example  involves  Planning,  which  in  the  CIR  defi¬ 
nition  is  Included  in  Tooling.  Planning  consists  of  two  components — 
tool  planning  and  production  planning.-  A  company  may  put  the  first  in 
Tooling  and  the  second  in  Manufacturing.  Other  practices  are  to  include 
tool  planning  in  Engineering,  to  put  all  planning  in  Manufacturing,  or 
to  include  a  portion  in  Overhead. 

Table  1  illustrates  this  problem  more  concretely.  A  slightly  ab¬ 
breviated  version  of  the  CIR  list  of  cost  elements  appears  on  the  left; 
on  the  right,  the  cost  elements  used  by  a  large  aerospace  company  and 
the  nonrecurring  costs  of  a  proposed  airframe.  The  lists  are  differ¬ 
ent  and,  as  shown  by  Table  2,  a  simple  rearrangement  of  the  contractor 
cost  elements  does  not  solve  the  adjustment  problem.  Four  of  the  con¬ 
tractor  cost  elements  remain;  Developmental  Material  ($2.6  million). 
Outside  Production  ($70  thousand).  Other  Direct  Charges  ($2.7  million), 
and  Manufacturing  Overhead  ($28.94  million).  These  are  not  trivial  ad¬ 
justments:  These  four  elements  can  amount  to  well  over  half  the  total 
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cost  of  &  large  production  contract.  Developmental  Material  presumably 
would  be  split  between  Engineering  Material  and  Manufacturing  Material; 
Other  Direct  Charges  would  have  to  be  allocated  among  Engineering,  Tool¬ 
ing,  Quality  Control,  and  Manufacturing;  and  part  of  Manufacturing  Over¬ 
head  would  be  apportioned  to  Tooling  Overhead  and  Quality  Control  Over¬ 
head.  In  each  of  these  instances,  the  contractor  who  furnished  the  CIR 
information  would  be  able  to  make  the  necessary  adjustments  from  his  own 


Table  1 

ILLUSTRATIVE  COMPARISON  OF  CIR  AND  AIRFRAME  CONTRACTOR  COST  ELEMENTS 

Airfvcme  Contractor 


CIR  Nonrecurring  costs 

Cost  Element _ Cost  Element _ ($  thousands) 

Engineering  Engineering  . 8,600 

Direct  labor 

Overhead  Manufacturing 

Material  Developmental 

Other  direct  direct  labor  . .  2,500 

charges  Tooling  direct 

labor  .  11,600 

Tooling  Production  direct 

Direct  labor  labor  . 850 

Overhead  Developmental 

Materials  and  material  . .  2,600 

purchased  tools  Tooling  material  . . 2,600 

Other  direct  Production  material  .  500 

charges  Purchased  equipment  .  5 

Outside  production  .  70 

Quality  Control 
Direct  labor 

Overhead  Inspection  .  620 

Other  direct 

charges  Other  Direct  Charges  .  2,700 

Manufacturing  Overhead 

Direct  labor  Engineering  .  10,200 

Overhead  Manufacturing  .  28,940 

Materials  and 
purchased  parts 
Other  direct 
charges 


Purchased  Equipment 


Material  Overhead 
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accounting  records.  Outside  Production  costs,  although  small  In  this 
example,  may  constitute  30  to  40  percent  of  the  total  cost  of  an  air¬ 
frame  in  some  cases.  When  this  happens,  the  labor  hours  and  materia', 
costs  incurred  by  the  prime  contractor  fall  far  short  of  the  total  re¬ 
quired  to  build  an  airplane;  a  method  of  arriving  at  a  total  must  be 

Table  2 

AIRFRAME  CONTRACTOR  COST  ELEMENTS  ARRANGED  IN  CIR  FORMAT 


Airframe  Contvaotov 


CIR 

Cost  Element 

Cost  Element 

Nonrecurring  Costs 
($  thousands) 

Engineering 

Direct  labor 

Engineering 

8,600 

Overhead 

Engineering  overhead 

10 ‘,200 

Material 

Other  direct 
charges 

— 

Tooling 

Direct  labor 

Tooling  direct  labor 

11,600 

Overhead 

Materials  and  pur¬ 
chased  tools 

Tooling  material 

2,600 

Other  direct 
charges 

— 

Quality  control 

Direct  labor 

Inspection 

620 

Overhead 

Other  direct 
charges 

— 

Manufacturing 

Direct  labor 

Developmental 

direct  labor 

2,500 

Production 
direct  labor 

850 

Overhead 

Materials  and  pur¬ 
chased  parts 

Production  material 

500 

Other  direct 
charges 

— 

Purchased  equipment 

Purchased  equipment 

5 

Material  overhead 
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devised  to  permit  the  data  to  be  analyzed  on  a  comparable  basis,  l.e., 
on  an  equivalent  100-percent  Inplant  basis.  Ordinarily,  the  contrac¬ 
tor  would  have  a  detailed  breakout  of  costs  only  for  subcontractors  on 
cost-reimbursable  contracts,  and  other  Outside  Production  costs  would 
have  to  be  allocated  to  the  specified  categories.  Production  labor 
hours  Incurred  out  of  plant,  for  example,  are  often  estimated  on  the 
basis  of  the  weight  of  that  portion  of  the  airframe  being  built  out  of 
plant.  In  using  historical  data,  the  analyst  may  be  In  a  similar  posi¬ 
tion:  When  the  amounts  Involved  are  large,  he  should  be  guided  by  what¬ 
ever  Information  the  contractor  can  provide. 

Physical  and  Performance  Considerations 

A  problem  that  resembles  the  one  discussed  above  Is  the  need  for 
consistency  In  definitions  of  physical  and  performance  characteristics. 
For  example,  speed  can  be  defined  in  many  w.ays — maximum  speed  at  opti¬ 
mal  altitude,  true  speed,  equivalent  speed,  indicated  speed.  All  of 
these  defining  terms  differ  In  exact  meaning  and  value.  The  weight  of 
an  aircraft  or  missile  depends  on  what  is  included.  Gross  weight, 
empty  weight,  and  airframe  unit  weight  apply  to  aircraft,  but  each  of 
these  terms  also  differs  In  exact  meaning  and  value.  Some  agencies  In¬ 
clude  sweep  volume  in  their  definition  of  the  physical  volume  of  an  air¬ 
craft  fire  control  system;  others  exclude  it.  Differences  such  as  these 
can  lead  an  analyst  unfamiliar  with  the  equipment  to  use  inconsistent 
or  varying  values  inadvertently.  When  data  are  being  collected  from  a 
variety  of  sources,  an  understanding  of  the  terms  used  to  describe  phys¬ 
ical  and  performance  characteristics  is  at  least  as  important  as  an 
understanding  of  the  content  of  the  various  cost  elements. 

Nonrecurring  and  Recurring  Costs 


Another  problem  that  involves  questions  of  definition  concerns 
nonrecurring  and  recurring  costs.  Recurring  costs  are  a  function  of 
the  number  of  items  produced;  nonrecurring  costs  are  not.  Thus,  for 
estimating  purposes  it  is  useful  to  distinguish  between  the  two,  and 
the  CIR  provides  for  this  distinction.  Unfortunately,  historical  cost 
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data  frequently  show  such  cost  elements  as  nonrecurring  and  recurring 
engineering  hours  as  an  accumulated  item  in  the  initial  contract.  Var¬ 
ious  analytical  techniques  have  been  developed  for  dividing  the  total 
into  its  two  components  synthetically,  but  it  is  not  clear  at  this  time 
whether  the  nonrecurring  costs  that  are  obtained  by  ex  post  facto  meth¬ 
ods  will  be  comparable  with  those  reported  in  the  CIR.  The  CIR  instruc¬ 
tions  state: 

it  is  preferable  to  identify  the  point  of  segregation  be¬ 
tween  nonrecurring  and  recurring  engineering  costs  as  a 
specific  event  or  point  in  time.  Ideally,  the  event  used 
would  be  the  point  at  which  "design  freeze"  takes  place  as 
a  result  of  a  formal  test  or  inspection,  and  after  which 
formal  Engineering  Change  Proposal  (ECP)  procedures  must 
be  followed  to  change  design.  If  no  reasonable  event  can 
be  specified  for  this  purpose,  then  all  engineering  costs 
incurred  up  to  the  date  of  !)0  percent  engineering  drawing 
release  nay  be  used.* 

Although  it  would  be  premature  to  consider  the  kinds  of  adjustments 
needed  before  a  body  of  CIR  data  exists,  splicing  historical  data  to 
CIR  data  may  also  involve  adjustments. 

A  more  subtle  problem  arises  when  nonrecurring  costs  on  one  prod¬ 
uct  are  combined  with  recurring  costs  on  another,  i.e.,  when  the  con¬ 
tractor  is  allowed  to  fund  development  work  on  new  products  by  charging 
it  off  as  an  operating  expense  against  current  production.  This  prac¬ 
tice  is  especially  prevalent  in  the  aircraft  engine  industry.  Separa¬ 
tion  of  the  nonrecurring  and  recurring  costs  means  an  adjustment  of 
the  production  costs  shown  in  contract  or  audit  documents  to  exclude 
any  amortization  of  development.  The  nonrecurring  expense  that  has 
been  amortized  can  then  be  attributed  to  the  item  for  which  it  was  in¬ 
curred.  Such  an  adjustment  can  only  be  accomplished  in  cooperation 
with  the  accounting  department  of  the  companies  that  are  involved.  It 
would  not  be  necessary,  of  course,  for  equipment  on  which  CIR  data  are 
available. 


U.S.  Department  of  Defense,  Cost  Information  Report  (CIR)  for 
Aircraft j  Missile t  and  Space  Systems ,  Budget  Bureau  No.  22-R260,  Was< - 
liigton,  D.C.,  April  21,  1966,  p.  43. 
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Price-level  Changes 

Figure  1  shows  the  change  In  average  hourly  earnings  of  production 
workers  on  manufacturing  payrolls  from  1920  to  1965.  Although  these 
earnings  declined  slightly  during  the  early  1920s  and  again  during  the 
Depression,  the  trend  has  been  steadily  upward  since  1934.  The  hourly 
wage  rate  has  increased  by  a  factor  of  4.75  over  a  45-year  period;  in 
other  words,  a  manufacturer  paid  $4.75  for  labor  in  1965  that  would 
have  cost  him  $1.00  in  1920.  The  implication  for  equipment  cost  is 
clear.  If  the  labor  component  of  an  automobile  cost  $500  in  1920,  the 
cost  for  the  same  car  today  would  be  something  over  $2000;  however,  the 
hours  required  in  1965  would  be  less  because  of  increased  productivity. 

The  relevance  of  these  observations  to  the  subject  of  data  adjust¬ 
ment  is  that  the  manufacturing  date  of  the  different  hardware  items  in 
a  sample  are  normally  spread  over  a  period  perhaps  as  long. as  ten  to 
fifteen  years.  To  compare  a  missile  built  in  1955  when  labor  cost  about 
$2.35  per  hour  with  a  missile  built  ten  years  later  when  the  labor  rate 
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had  increased  to  over  $3.35  per  hour,  requires  that  the  labor  cost  of 
both  be  adjusted,  to  a  coriunon  base.  (This  problem  is  obviated  by  deal¬ 
ing  in  hours  rather  than  dollars,  but  an  adjustment  would  still  be 
needed  for  raw  material  and  purchased  parts.)  Adjustments  are  made  by 
means  of  a  price  index  constructed  from  a  time-series  of  data  in  which 
one  year  is  selected  as  the  base  and  the  value  for  that  year  expressed 
as  100.  The  other  years  are  then  expressed  as  percentages  of  this  base. 
The  hourly  earnings  from  1950  to  1960  for  production  workers  could  be 
converted  to  an  index  using  any  of  the  years  as  the  base;  in  Table  3, 
1950  and  1960  have  both  been  used  as  base  years. 


Table  3 


AVERAGE 

HOURLY  EARNINGS 

INDEX 

Average 

Hourly 

Index  with 

hidex  with 

Earnings 

1950  as 

1960  as 

Year 

($) 

Base  Year 

Base  Year 

1950 

1.44 

100 

64 

1951 

1.56 

108 

69 

.1952 

1,65 

115 

73 

1953 

1.74 

121 

77 

1954 

1.78 

124 

79 

1955 

1.86 

129 

82 

1956 

1.95 

135 

86 

1957 

2.05 

142 

91 

1958 

2.11 

14" 

93 

1959 

2.19 

152 

97 

1960 

2.26 

157 

100 

SOURCE:  U.S.  Department  of  Labor,  Employment 
and  Earnings  Statistics  for  the  United  States^ 
1909-66f  Bulletin  No.  11312-4,  Washington,  D.C., 
October  1966. 


The  information  needed  to  construct  a  labor  index  is  available  in 
the  Bureau  of  Labor  Statistics  (BLS)  monthly  publication  Employment  and 
EamingSi  and  Table  4  presents  Indexes  based  on  this  source.  Changes  in 
materials  costs  are  available  in  another  BLS  monthly  publication,  Vlhole- 
sale  Prices  and  Price  Indexes.  These  indexes  can  be  used  to  develop  a 
materials  price  index  for  a  given  type  of  equipment  by  selecting  from 
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Table  4 

LABOR  PRICE  INDEXES 


Year 

Aircraft 

Aircraft 

Engines 

and 

Engine 

Farts 

Other  . 
Aircraft 
Parts  and 
Equipment 

Motor 

Vehicles 

and 

Equipment 

Electrical 

Equipment 

and 

Supplies 

Ship 
and  Boat 
Building 

1952 

.59 

.61 

a 

na 

.61 

.64 

.62 

1953 

.63 

.63 

na 

.64 

.67 

.67 

1954 

.66 

.65 

na 

.66 

.69 

.68 

1955 

.69 

.67 

na 

.69 

.71 

.70 

1956 

.72 

.71 

na 

-.70 

.76 

.11* 

1957 

.75 

.74 

na 

.74 

.79 

.79 

1958 

.80 

.79 

.79 

.76 

.82 

.82 

1959 

.84 

.83 

.83 

.81 

.85 

.85 

1960 

.86 

.86 

.86 

.84 

.88 

.88 

1961 

.88 

.89 

.88 

.86 

.91 

.93 

1962 

.91 

.92 

.91 

,90 

.93 

.95 

1963 

.94 

.94 

.94 

.93 

.95 

.99 

1964 

.95 

.97 

.97 

.96 

.97 

1.00 

1965 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

^ot  available  for  years  prior  to  1958.  For  the  years  1952-1957, 
the  labor  price  index  for  aircraft  should  be  used. 


t!  otr  dity  groups  in  the  Wholesale  Price  Index  a  list  of  materials 
repre'  .jtative  of  those  used  in  constructing  the  equipment;  these  mate¬ 
rials  are  then  weighted  according  to  estimates  of  the  value  of  each 
in  fabricating  the  equipment.  A  composite  aircraft  raw-materials  in¬ 
dex  might  be  based  on  the  following  materials  and  weights: 


Finished  steel  . . 02 

Stainless  steel  sheet . 04 

Titanium  sponge . 07 

Aluminum  sheet  . 29 

Aluminum  rod  . .11 

Aluminum  extrusions . 20 

Wire  and  cable . 12 

Rivets,  nuts,  bolts . 15 


For  any  given  year  a  price  index  for  each  of  these  is  obtained  and  a 
composite  index  constructed  by  summing  the  individual  index  numbers 
multiplied  by  the  weightings  as  shown  in  Table  5.  Weights  in  an  index 
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Table  5 

AIRCRAFT  RAW-MATERIALS  INDEX 


Ccxmodity 

1967  Index 
Nunibex^^ 

Weight 

Index 

Number  x  Weight 

Finished  steel 

105.8 

.02 

2.12 

Stainless  steel  sheet 

108.0 

.04 

4.32 

Titanium  sponge 

60.3 

.07 

ii.22 

Aluminum  sheet 

99.8 

.29 

28.94 

Aluminum  rod 

110.4 

.11 

12.14 

Aluminum  extrusions 

75.6 

.20 

15.12 

Wire  and  cable 

126.0 

.12 

15.12 

Rivets,  nuts,  bolts 

133.2 

.15 

19.98 

Composite  index  number  . 

>  •  •  •  • 

.  .  101.96 

^1957-1959  =  100. 


need  to  be  updated  from  time  to  time  to  reflect  changing  technology; 
it  may  be  that  those  shown  in  Table  5  are  applicable  only  to  current 
aircraft.  Table  5  merely  illustrates  the  principle  of  deriving  a  com¬ 
posite  index;  the  reader  who  wishes  to  pursue  the  matter  will  find  in¬ 
dex  numbers  discussed  in  textbooks  on  economic  statistics.*  Another 
type  of  composite  index  is  used  in  those  instances  in  which  labor  and 
material  costs  cannot  be  separated  and  the  price-level  adjustment  has 
to  be  made  to  the  total  cost  of  an  engine,  airframe,  or  missile.  Such 
an  index  can  be  derived  in  the  manner  illustrated  in  Table  4  with  the 
labor  and  material  elements  weighted  according  to  the  pattern  that  has 
been  found  to  exist  in  the  past  (e.g.,  labor,  80  percent;  materials, 

20  percent).  Overhead,  which  is  a  mixture  of  indirect  labor,  materials, 
and  items  such  as  rent,  utilities,  taxes,  and  fringe  benefits,  is  ad¬ 
justed  in  most  cases  by  the  same  percentage  as  direct  labor.  To  decide 
whether  a  different  adjustment  factor  should  be  used,  it  v/ould  be  nec¬ 
essary  to  examine  each  of  these  components. 


See,  for  example,  W.  A.  Spurr,  L.  S.  Kellogg,  and  J.  H.  Smith, 
Business  and  Eaoncmio  Statistics ^  rev.  ed.,  Richard  D.  Irwin,  Inc., 
Homewood,  Illinois,  1961.  It  is  Important  to  recognize  the  differences 
in  indexes  that  may  result  from  weighting  by  base  year  or  a  given  year, 
i.e.,  Laspeyres'  or  Paasche's  index.  These  are  also  discussed  in  text¬ 
books  on  economic  statistics. 
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The  adjustment  of  costs  for  yearly  price  changes  is  not  always  as 
straightforward  as  the  foregoing  discussion  may  imply.  One  problem  is 
that  price  indexo’j  are  inherently  Inexact  and  their  use,  while  iieces- 
sary,  can  introduce  errors  into  the  data.  The  average  hourly  earnings 
for  all  aircraft  production  workers  may  increase  by  $.05  in  a  given 
year,  but  at  any  particular  company  they  will  increase  more  or  less 
than  that  amount.  Use  of  the  average  number  to  adjust  the  data  for  a 
given  company  may  bias  the  data  up  or  down.  Also,  for  many  specialized 
items  of  equipment,  a  good  published  price  index  does  not  exist.  In 
fact,  the  usual  Indexes  are  oriented  toward  the  civilian  economy  and 
may  be  misleading,  i.e.,  they  may  tinderstate  the  change  experienced  in 
defense  and  space  industries.  The  United  States,  with  many  other  coun¬ 
tries,  furnishes  the  Office  of  Economic  Cooperation  and  Development 
in  Paris  with  an  index  applicable  to  government  defense  expenditures 
in  general.  This  index,  shown  in  Table  6  for  1952-1964,  is  a  useful 
reference  when  detailed  index  numbers  seem  questionable  or  are  non¬ 
existent. 


Table  6 


DEFENSE 

EXPENDITURES 

INDEX, 

1952-1964 

Index 

Index 

Year 

Number 

Year 

Number 

1952 

84 

1959 

102 

1953 

83 

1960 

104 

1954 

84 

1961 

105 

1955 

88 

1962 

106 

1956 

93 

1963 

108 

1957 

97 

1964 

113 

1958 

100 

Another  problem  is  that  of  identifying  the  years  in  which  expendi¬ 
tures  occur  when  the  only  data  available  show  total  contract  cost.  Pro¬ 
duction  and  cash  flow  may  have  been  spread  out  over  a  period  of  several 
years,  and  in  principle  the  costs  should  be  adjusted  for  each  year  sep¬ 
arately.  Although  the  CIR  will  provide  the  information  needed  to  do 
this  in  the  future,  this  information  may  be  unavailable  today  and  some 
reasonable  approximation  of  the  expenditure  pattern  must  suffice. 
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One  met.;hod  of  obtaining  this  approximation  is  to  use  a  percent-of- 
cost  versus  percent-of-time  curve  of  the  type  Illustrated  in  Fig.  2. 
These  curves  are  developed  from  historical  data  on  a  number  of  programs 
involving  the  same  kind  of  hardware — large  ballistic  missiles  in  this 
case — and  can  be  used  to  break  total  research  and  development  or  total 
production  cost  into  annual  expenditures.  For  example,  to  determine 
the  annual  expenditures  in  a  five-year  R&D  program  amounting  to  a  total 
of  $50  million  the  following  pergentages  would  be  obtained  from  the  R&D 
curve  of  Fig.  2: 


Time 

Expenditures 

20 

5.5 

40 

23.0 

60 

65.0 

80 

92.0 

100 

100.0 

These  percentages  are  cumulative,  of  course,  so  the  annual  percentages 
and  the  amount  they  represent  would  be; 

Expenditures 

Year  Percent  $  Millions 


1 

5.5 

2.75 

2 

17.5 

8.75 

3 

42.0 

21.00 

4 

27.0 

13.50 

5 

8.0 

4.00 

In  the  production  phase,  a  technique  that  can  be  used  is  to  de¬ 
velop  lag  factors  by  examining  delivery  schedules  and  production  lead 
times.  Costs  are  then  lagged  behind  delivery  dates  by  some  reasonable 
factor. 

A  more  fundamental  question  than  any  of  those  raised  above  is 
whether  yearly  price  changes  should  be  made  at  all.  It  is  sometimes 
argued  that  the  upward  trend  in  wage  rates  has  been  accompanied  by  a 
parallel  trend  in  the  output  per  employee  or  productivity  rate.  This 
argument  implies  that  there  has  been  little  change  in  the  real  costs 
of  aerospace  equipment  because  increases  in  wages  and  materials  cost 
have  been  offset  by  a  decrease  in  the  number  of  employees  required 
per  dollar  of  output.  However,  the  real  dollar  output  per  man  is  dif¬ 
ficult  to  measure  in  an  Industry  in  which  continual  change  rather  than 


Fig.  2 — Peraent-of-oost  versus  -percent-of-time  curves 

standardization  is  the  rule.  Certainly  the  growth  in  productivity  is 
not  uniform  for  aircraft,  missiles,  ships,  and  tanks,  and  to  develop  a 
productivity  index  for  each  would  be  a  difficult  and  contentious  task. 
Present  practice,  therefore,  is  to  apply  the  price-level  adjustment  fac 
tors  to  obtain  constant  dollars  and,  at  the  same  time,  to  remain  alert 
to  inequities  that  may  be  introduced  by  following  this  procedure.  As 
an  illustration  of  the  significance  of  price-level  adjustments.  Fig.  3 
shows  the  effect  of  adjusting  production  costs  incurred  over  the  pe¬ 
riod  1959-1965  (open  circles)  to  1962  dollars  (closed  circles) .  Both 
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100  te 


1000 


Quantity  produced 


Fig.  S— Effect  of  adjustment  fov  ■price-level  changes 


the  level  of  cost  and  the  slope  of  the  curve  change  as  a  result  of  the 
price-level  adjustment.  (In  this  example  a  crossover  occurs  because 
the  year  1962  has  been  selected  as  a  base  for  adjustment.) 

Cost-quantity  Adjustments 

The  cost-quantity  relationship,  discussed  at  length  in  Sec.  V, 
is  usually  known  in  the  aerospace  industry  as  the  learning  curve.  The 
cost-quantity  relationship  may  be  defined  in  brief  as  follows:  Each 
time  that  the  total  quantity  of  items  produced  doubles,  the  cost  per 
item  is  reduced  to  some  constant  percentage  of  its  previous  value. 
Whether  or  not  this  particular  formulation  is  accepted,  the  fact  re¬ 
mains  that,  for  most  production  processes,  costs  are  invariably  a 
function  of  quantity:  As  the  number  of  items  produced  increases,  cost 
normally  decreases.  Thus,  in  speaking  of  cost,  it  is  essential  that  a 
given  quantity  be  associated  with  that  cost.  An  equipment  item  can  be 
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said  to  cost  $100,000,  $80,000,  $64,000,  or  $51,200,  and  all  of  these 
numbers  will  be  correct. 

Which  cost  should  be  used  by  the  cost  analyst?  The  answer  will  de¬ 
pend  on  a  number  of  factors;  if  his  purpose  is  to  compare  one  missile 
with  another,  the  cumulative  quantity  must  be  the  same  for  both  mis¬ 
siles.  The  adjustment  to  a  specific  quantity  is  a  simple  matter  if  the 
slope  of  the  learning  curve  is  known  or  if  it  can  be  inferred  from  the 
data.  Take,  for  example,  the  costs  for  three  missiles: 

Missile  Unit  Number  Cost /Unit  (^) 

A  50  1000 

B  100  1000 

C  200  1000 

Although  the  cost  is  the  same  for  each,  the  number  of  units  is  differ¬ 
ent.  Thus,  for  a  cost  comparison,  the  units  must  be  adjusted  to  a  com¬ 
mon  quantity.  If  100  is  chosen  and  an  80-percent  learning  curve  assumed 

for  all  three  missiles,  the  adjusted  costs  will  be  as  follows: 

Missile  Unit  Number  Cost/Unit  ($) 

A  100  800 

B  100  1000 

C  100  1250 

To  project  labor  requirements  for  the  100th  unit  when  only  50  units  have 
been  produced  is  somewhat  uncertain,  but  to  ignore  the  cost-quantity  re¬ 
lationship  will  in  most  instances  result  in  greater  error  than  such  a 
projection  introduces.  (The  learning  curve  is  most  frequently  depicted 
as  a  straight  line  on  logarithmic  scales  as  shown  in  Fig.  3.) 

Other  Possible  Cost  Adjustments 

The  lack  of  a  way  to  adjust  cost  data  for  productivity  changes 
over  time  is  illustrative  of  the  current  situation  in  which  more  kinds 
of  cost  adjustments  have  been  theorized  than  have  been  quantified. 

For  example,  it  has  been  suggested  that  adjustment  may  be  required  be¬ 
cause  of  differences  in  contract  type  (fixed-price,  fixed-price-incen¬ 
tive,  cost-plus-fixed-fee  contracts)  or  differences  in  the  type  of 
procurement  (competitive  bidding  or  sole  source) .  The  hypothesis  is 
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that  the  type  of  contract  or  procurement  procedure  will  bias  costs  up 
or  down,  but  this  hypothesis  is  difficult  to  substantiate. 

Another  question  concerns  manufacturing  techniques.  What  are  the 
effects  of  varying  amounts  of  capital  investment  or  capital  improve¬ 
ment  and  of  changes  in  manufacturing  state  of  the  art?  A  related  ques¬ 
tion  concerns  the  efficiency  of  the  contractor.  It  may  be  surmised 
that  Contractor  A  has  been  a  lower  cost  producer  than  Contractor  B  on 
similar  Itemo,  but  this  is  extremely  difficult  to  prove.  A  low-cost 
producer  may  be  one  who,  because  of  his  geographical  location,  pays 
lower  labor  rates.  Contractors  in  Fort  Worth,  Texas,  and  in  Atlanta, 
Georgia,  may  have  a  considerable  advantage  in  this  regard  over  their 
competitors  in  Los  Angeles  and  San  Francisco,  California,  and  in  Seattle, 
Washington.  Table  7  does  not  give  a  fair  picture  of  comparative  rates 
because  differences  among  industries  in  the  various  cities  tend  to  be 
more  Important  than  differences  in  location.  But,  for  two  cities  as 
close  together  as  Los  Angeles  and  San  Francisco,  labor  rates  differ  by 
10  percent.  Thus,  although  it  might  not  be  possible  to  adjust  cost  data 
on  the  basis  of  contractor  efficiency,  adjustments  can  be  made  for 
differences  in  location  by  using  the  specific  area  labor  rates. 


Table  7 

AVERAGE  H0URI.Y  EARNINGS  OF  PRODUCTION  WORKERS 
ON  MANUFACTURING  PAYROLLS,  NOVEMBER  1965 
(in  dollars) 


Atlanta  .  2.69 

Boston  .  2.69 

Chicago  .  2.91 

Detroit  .  3.45 

Los  Angeles  .  3.04 

New  Orleans  .  2.72 

New  York  .  2.63 

Philadelphia  .  2.79 

St.  Louis  .  2.96 

San  Francisco  .  3.35 

Seattle  .  3.25 


SOURCE:  U.S.  Department  of  Labor,  Bureau 
of  Labor  Statistics,  Employment  and  Ecxnings, 
Washington,  D.C.,  January  1966. 


III.  STATISTICAL  METHODS  IN  DEVELOPMENT  OF 
ESTIMATING  RELATIONSHIPS 

MANY  ESTIMATING  RELATIONSHIPS  are  simple  statements  that  indicate  that 
the  cost  of  a  commodity  is  directly  proportional  to  the  weight,  area, 
volume,  or  other  physical  characteristic  of  that  commodity.  These 
estimating  relationships  are  simple  averages;  they  are  useful  in  a  vari¬ 
ety  of  situations  and,  because  of  their  simplicity,  they  require  little 
explanation.  In  this  section,  the  statistical  considerations  involved 
in  developing  cost-estimating  relationships  for  advanced  equipment  are 
examined.  The  emphasis  is  on  the  derivation  of  more  complex  relation¬ 
ships,  i.e.,  equations  that  are  able  to  reflect  the  Influence  on  cost 
of  more  than  one  variable.  The  Intent  is  to  illustrate  a  general  ap¬ 
proach  to  the  development  of  such  relationships  and  to  introduce  basic 
concepts  of  statistical  analysis.  The  emphasis  is  i'.ot  on  statlsticc 
per  se;  the  basic  statistical  theory  as  well  as  the  computational  as¬ 
pects  involved  in  developing  these  relationships  are  included  only  to 
clarify  practical  considerations.  Statistical  analysis  can  help  pro¬ 
vide  an  understanding  of  factors  that  Influence  cost,  but  estimating 
relationships  are  no  substitute  for  understanding;  regression  analysis, 
which  will  be  discussed  in  this  study,  does  not  offer  a  quick  and  easy 
solution  to  all  the  problems  of  estimating  cost. 

The  outstanding  characteristic  of  a  cost  factor  is  that  the  rela¬ 
tionship  between  cost  and  the  explanatory  variable  is  direct  and  ob¬ 
vious;  thus,  cost  per  pound  is  widely  used  because  of  the  generally 
satisfying  thesis  that  as  a  ship,  tank,  or  airplane  Increases  in  weight 
it  becomes  more  costly.  Weight  changes  alone  do  not  always  adequately 
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explain  cost  changes,  however,  and  additional  explanatory  variables  are 
olten,  needed.  The  problem  is  to  find  these  variables  and  their  rela¬ 
tionship  to  cost.  The  procedure  is  to  .decide  what  variables  are  log¬ 
ically  or  theoretically  related  to  cost  and  then  to  look  for  patterns 
in  the  data  that  suggest  a  relationship  between  cost  and  the  variables. 
Table  1  contains  a  set  of  data  on  cost  and  selected  variables  that  can 
bs  analysed  for  such  patterns.  The  costs  of  ten  airborne  radio  commu¬ 
nication  sets  are  given  with  the  weight,  power  output,  and  frequency 
of  each.  It  la  to  be  expected  that  cost  would  Increase  with  weight  or 
with  power  output.  Frequency  is  also  included  because  in  the  past 
higher  and  higher  frequencies  have  been  sought  to  increase  communica¬ 
tion  capacity  and,  for  a  given  power  output,  higher  frequency  sets 
have  been  more  costly. 

A  graphic  analysis  of  the  data  in  Table  1  shows  that  cost  is  not 
a  simple  linear  function  of  any  of  the  three  explanatory  variables. 

Cost  tends  to  Increase  with  weight,  but  there  are  notable  exceptions 
to  the  trend,  as  illustrated  by  the  scatter  diagram  of  Fig.  1.  Cost 
plotted  against  power  output  as  shown  in  Fig  2  is  even  less  promising, 
partly  because  the  arithmetic  scale  does  not  enable  an  observer  to  dis¬ 
tinguish  among  the  points  between  .5  and  30  watts.  The  change  from  an 
arithmetic  to  a  logarithmic  scale  shown  in  Fig.  3  spreads  the  points  in 
the  low-power  range  and  indicates  that  a  trend  may  exist,  but  with  a 
very  wide  scatter. 


Table  1 


TEN  AIRBORNE  RADIO  COMMUNICATION  SETS 


Cost 

Weight 

Power .Output 

Frequency 

($} 

(lb) 

(w) 

(MHz) 

22,200 

90 

20 

too 

17,300 

161 

400 

30 

11,800 

40 

30 

400 

9,600 

108 

10 

400 

8,800 

82 

10 

400 

7,600 

.135 

100 

25 

6,800 

59 

6 

400 

3,200 

68 

8 

156 

1,700 

25 

8 

42 

1,600 

24 

0.5 

258 

1 


Cost  ($  thousaiKls^ 


The  wide  scatter  in  Fig.  3  is  explained  in  part  by  recognizing  the 
effect  of  frequency.  In  Fig.  4,  each  point  is  identified  by  frequency 
class:  High  Frequency  (HF) ,  up  to  30  MHz;  Very  High  Frequency  (VHF) , 

30  to  300  MHz;  and  Ultra  High  Frequency  (UHF) ,  above  300  MHz.  A  clearer 
relationship  exists  between  cost  and  power  output  within  each  frequency 
class  than  exists  for  the  whole  sample  scattered  without  regard  to  fre¬ 
quency.  This  suggests  that  the  sample  is  not  homogeneous.  Each  fre¬ 
quency  band  nay  constitute  a  separate  sample,  or  possibly  HF  and  VHF 
costs  are  on  one  level  and  UHF  costs  are  on  another. 

At  this  point,  it  is  not  clear  if  any  of  the  expl:inatory  variables, 
either  singly  or  in  combination,  will  yield  a  useful  estimating  relation¬ 
ship,  or  if  a  single  relationship  can  serve  for  all  frequencies.  To 
illustrate  techniques  that  are  commonly  employed  in  deriving  estimating 
relationships,  assume  that  cost  can  be  related  to  a  single  predictive 
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Fig.  4--Identifioation  by  frequency  class 


variable — that  of  weight.  The  results  of  a  linear  normal  simple  regres¬ 
sion  model  will  then  be  examined.  Later,  several  variables  in  a  multi¬ 
ple  regression  analysis  will  be  considered,  and  the  problem  of  the  ap¬ 
parent  nonhomogeneous  character  of  the  sample  illustrated  in  Fig.  4 
will  be  reexamined. 

Regression  has  become  a  widely  accepted  tool  for  cost  analysis, 
and  it  is  frequently  used  to  develop  estimating  relationships.  The 
technique  of  regression  analysis  can  be  thought  of  as  consisting  of  two 
distinct  stages.  The  first  is  that  of  estimating  the  constant  and  co¬ 
efficients  of  the  equation,  and  the  second  is  that  of  inferring  the  re¬ 
liability  and  significance  of  the  results  of  the  estimate  on  the  basis 
of  assumed  (and  to  a  degree  verifiable)  properties  possessed  by  the 
data  and  the  results.  Regression  analysis  <'iS  a  technique  is  applicable 
only  to  the  two  stages  performed  together.  Estimating  coefficients  or 


38 


EQUIPMENT  COST  ESTIMATING 


curve  fitting  is  simply  a  mathematical  exercise.  Only  when  these  esti¬ 
mating  procedures  are  used  as  a  basis  for  making  statistical  Inferences 
can  they  be  viewed  as  part  of  a  regression  analysis. 


Simple  Linear  Regression 

The  form  of  the  relationships  between  cost  and  the  explanatory 
variable(s)  depends  on  the  problem.  It  may  reflect  either  an  under¬ 
lying  physical  law  or  a  structural  relationship.  Wlien  no  particular 

f 

functional  form  is  suspected,  a  simple  (two-variable)  linear  model  is 
frequently  used  to  describe  the  relationship  between  two  variables. 

In  this  case,  the  equation  of  the  model  Is 

y  =  a  +  bx,  (1) 

where  y  is  the  dependent  variable  and  x  is  the  explanatory  variable. 

The  symbols  a  and  b  are  the  constant  and  coefficient,  respectively,  of 
the  equation  estimated  from  the  data.  Here  y  could  represent  the  cost 
of  a  radio  communication  set  and  x  could  represent  the  weight.  If  it 
is  assumed  that  b  is  greater  than  zero,  the  model  indicates  that  heavier 
equipment  will  cost  more  than  lighter  equipment.  When  the  values  of  a 
and  b  are  known,  it  is  possible  to  compute  y  (cost)  for  any  given  value 
of  X  (weight). 

Least-squares  Estimating 

Given  Eq.  (1),  the  basic  problem  in  the  first  phase  of  the  regres¬ 
sion-analysis  is  to  derive  estimates  of  the  parameters  a  and  b.  The 
standard  procedure  is  the  method  of  least-squares.  The  values  of  a 
and  b  are  determined  by  the  requirement  that  the  sum  of  the  squares  of 
the  deviations  of  the  sample  observations  from  the  estimated  line  will 
be  at  a  minimum.  Symbolically,  this  minimum  is  expressed  as 
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where  y  .  is  the  ith  observation  and  y  .  Is  the  value  of  y  .  estimated 


from  the  equation 


^.  =  a  +  ix.. 


The  carets  over  a  and  h  indicate  that  a  and  h  are  least-squares 
estimates  of  the  true  but  unknown  values  of  a  and  h.  Thus  p.  is  the 
least-squares  estimate  of  y.  and  the  term  (y.  -  y.)  indicates  the  dif- 
f erence  between  each  observed  y .  and  between  each  corresponding  esti- 
mated  value  y..  This  is  illustrated  in  Fig.  5,  which  shows  the  actual 
(y)  and  estimated  (y)  value  of  the  dependent  variable  that  corresponds 
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to  a  specific  value  of  the  explanatory  variable  x.  The  line  shown  in 
Fig.  5  5.3  the  line  that  represents  Eq.  (3).  All  of  the  estimated  val¬ 
ues  of  y.  fall  on  this  line.  The  vertical  distance  from  point  A  to 
point  B  is  the  difference  between  the  actual  value  (y)  and  the  estima- 

A 

ted  value  (y).  The  summation  of  all  such  differences  that  are  squared 
(as  illustrated  in  Eq.  (2))  is  the  quantity  to  be  minimized  in  estima¬ 
ting  che  line. 

The  minimum  value  for  this  sum  is  satisfied  by  substituting  Eq. 

(3)  in  Eq.  (2),  taking  the  partial  derivatives  of  Eq.  (2)  with  respect 
to  a  and  b,  and  setting  the  results  equal  to  zero.  This  process  yields 
two  equations  that  are  called  normal  equations  and  that  can  be  solved 
for  a  and  bi 


I  y  =  m  +  b  I  X, 

lxy  =  alx  +  blx^, 

where  y  =  cost  of  airborne  radio  equipment  in  thousands  of  dollars, 

X  -  weight  of  airborne  radio  equipment  in  pounds, 
n  =  number  of  items  in  the  sample, 

T,  =  summation  (e.g.,  Z  y  =  the  sum  of  all  y’s). 

Table  2  contains  the  numerical  values  and  totals  required  to  solve  the 


Table  2 

DATA  FOR  REGRESSION  ANALYSIS  OF  COST  AND  WEIGHT 


2 

X 

y 

X 

xy 

90 

22.2 

8,100 

1,998.0 

161 

17.3 

25,921 

2,785.3 

40 

11.8 

1,600 

472.0 

108 

9.6 

11,664 

1,036.8 

82 

8.8 

6,724 

721,6 

135 

7.6 

18,225 

1,026.0 

59 

6.8 

3,481 

401.2 

68 

3.2 

4,624 

217.6 

25 

1.7 

625 

42.5 

24 

1.6 

576 

38.4 

792 

90.6 

81,540 

8,739.4 
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normal  equations  when  data  from  Table  1  are  used.  The  costs  are  ex¬ 
pressed  in  thousands  of  dollars.  When  the  values  from  Table  2  are  sub¬ 
stituted  in  the  normal  equations,  the  following  expressions  are  obtained 
for  the  sample  data  points  (n  -  10) : 

90.6  =  10a  +  7922?, 

8739.4  =  792a  +  81,5402). 

•k 

Solved  simultaneously,  these  equations  give 

a  =  2.477, 
h  =  .083, 


and  thus  fvom  Eq.  (3) 


y  =  2.477  +  .083x.  (4) 

The  line  represented  by  this  equation  is  shown  in  Fig.  6  as  the 
solid  line  with  the  actual  observations  plotted  as  dots.  The  extent 
of  the  dispersion  of  the  observations  relates  inversely  to  the  useful¬ 
ness  of  the  line  as  a  tool  for  estimating  the  values  of  y  from  the 
values  of  x.  The  greater  the  dispersion  of  observed  values  of  y  about 
the  line,  the  less  accurate  the  estimates  that  are  based  on  the  line 
are  likely  to  be.  The  measure  of  the  dispersion  about  the  regression 
line  is  called  the  standard  error  of  estimate  (SE)  of  the  equation  and 
is  shown  by  the  dashed  lines. 

One  measure  of  dispersion  in  a  collection  of  data  points  is  called 
the  variance.  The  variance  is  defined  as  the  sum  of  the  squared  dis¬ 
tances  to  each  of  the  data  points  from  a  cencral  reference  point  divided 
by  the  degrees  of  freedom  (df) ,  which  equal  the  number  of  independent 
bits  of  information  contained  in  the  sample.  (In  analyzing  the  data 


Slight  variations  may  exist  in  the  last  significant  figure  in  the 
examples  throughout  this  section  because  of  rounding  and  logarithmic 
transformations . 
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Weight  (lb) 


Fig,  6 — Regression  line  axid  standard  error  of  estimate 


that  are  given  in  Table  1,  the  degrees  of  freedom  equal  (n  -  2);  i.e., 
the  number  of  observations  n  less  the  number  of  constraints,  1  each 
for  a  and  b.) 

In  least-squares  procedures,  the  central  point  of  reference  for 
calculating  the  variance  of  each  variable  is  its  sample  mean,  which 
causes  the  least-squares  line  to  have  the  property  of  passing  through 
the  means  of  the  variables  used  to  estimate  the  line.  This  characteris¬ 
tic  is  shown  in  Fig.  5;  it  can  be  verified  by  dividing  both  sides  of  the 
first  normal  equation  by  n,  since  the  sample  mean  of  any  variable  u  is 
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By  referring  to  Fig.  5,  it  can  be  seen  that  the  total  distance 
from  y .  to  y  for  any  observation  on  y  is  the  distance  from  C  to  F. 
The  sum  of  all  such  distances  squared  and  divided  by  the  degrees  of 
freedom  is  called  the  total  variance  of  y: 

„  (Fv  -  y)’ 

Total  variance  of  y  =  )  - ; — .  , 

^  ^  n  -  1  < 


The  distance  from  C  to  A  indicates  the  amount  of  the  total  deviation 
of  y  from  y  which  is  explained  by  the  estimating  relationship.  Conse¬ 
quently,  the  sum  of  the  distances  from  y  to  the  line,  squared  and  di¬ 
vided  by  the  degrees  of  freedom,  is  called  the  explained  variance; 


Explained  variance  of  y  =  ^ 


-  y)^ 
«  -  2  ’ 


The  remaining  distance  from  4  to  B  is  the  residual  or  unexplained  de¬ 
viation  from  y .  to  y ,  or  the  unexplained  variance : 

'Z' 

7 

(y  •  -  y 

Unexplained  variance  of  y  =  \  ■_  — .  (8) 


The  standard  error  of  estimate  is  defined  as  the  square  root  of  the 
unexplained  variance  of  the  y'sv 

A  (u .  -  M  .)^ 

SE  =  /-  ^  ^  / 

J  n  -  1  ' 


For  the  equation  y  =  2.477  +  .083a,  the  standard  error  of  estimate 
is  $5,808.  This  value  has  been  plotted  above  and  below  the  regression 
line  in  Fig.  6.  The  interpretation  and  signif icavice  of  these  results 
will  be  discussed  in  connection  with  the  use  of  prediction  intervals. 

In  comparing  one  SE  with  another,  it  is  useful  to  compute  a  relative 
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standard  error  of  estimate.  One  such  measure  is  the  coefficient  of  var¬ 
iation  (CV),  which  relates  the  SE  to  the  mean  of  the  sample  y'si 


(10) 


Continuing  the  analysis  of  the  data  in  Table  1,  the  mean  of  the  ^'s  is 
$9,060.  Therefore,  the  value  of  CV  is 


$5,808 

$9,060 


.641. 


This  value  is  high.  Although  the  question  of  reliability  of  an  estima¬ 
ting  equation  is  relative  to  the  context  la  which  the  equation  is  to  be 
used,  a  value  at  least  as  small  as  10  to  20  percent  for  the  coefficient 
of  variation  is  desirable. 

The  standard  error  of  estimate  gives  a  measure  of  the  magnitude  of 
the  unexplained  variance.  Another  related  measure  of  dispersion  is 
given  by  the  coefficient  of  determination  that  shows  the  proportion  of 
total  variance  accounted  for  by  the  estimating  relationship: 

*  Coefficient  of  determination  =  ^^TotM^varianfe"^ 

Total  variance 

=  1  _  Unexplained  variance 
~  Total  variance  '  ' 


When  all  the  observed  points  in  the  sample  are  on  the  least-squares 
line,  the  coefficient  of  determination  equals  1  and  there  is  no  unex¬ 
plained  or  residual  variance.  As  the  proportion  of  total  variance  that 
remains  unexplained  increases,  the  coefficient  of  deteraination  ap¬ 
proaches  zero.  The  square  root  of  the  coefficient  of  determination  is 

A 

called  the  correlation  coefficient.  Correlation  has  no  substantive 


Since  total  variance,  L'q.  (6),  and  the  standard  error,  Eq.  (9), 
have  been  adjusted  for  degrees  of  freedom,  the  resulting  correlation 
coefficient,  the  square  root  of  Eq.  (11),  is  also  adjusted.  Some  com¬ 
puter  programs  do  not  adjust;  tlfe  variance  figures  are  then  biased  down¬ 
ward  and  the  correlation  coefficient  will  appear  larger  than  in  the 
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meaning  unless  both  the  dependent  and  explanatory  variables  are  assumed 
to  be  normal  random  variables.  The  ordinary  assumption  in  using  regres¬ 
sion  analysis  for  developing  estimating  relationships  is  that  only  the 
dependent  variable  is  random.  Consequently,  it  is  not  considered  good 
practice  for  the  correlation  coefficient  to  be  used  in  documenting  the 
results  in  this  particular  application  of  regression  analysis.  The 
inclusion  of  the  correlation  coefficient,  however,  causes  no  serious 
problem  since  it  is  simply  the  sq-uare  root  of  the  coefficient  of  deter¬ 
mination.  When  analysts  review  the  results,  they  can  easily  calculate 
the  latter  from  the  former.  Since  the  coefficient  of  determination  is 
always  in  the  range  between  zero  and  one,  its  square  root  will  always 
be  larger,  except  at  the  boundary  points  of  zero  and  one. 

The  coefficient  of  determination  for  Eq.  (4)  is  .325,  which  is 
relatively  low  and  further  substantiates  the  evidence  that  weight  alone 
is  not  a  good  predictor  of  the  cost  of  airborne  radio  communication 
equipment . 


Statistical  Inference 

The  standard  error  of  estimate,  the  coefficient  of  variation,  and 
the  coefficient  of  determination  Indicate  the  degree  of  accuracy  with 
which  the  estimating  equation  describes  the  sample  observations.  How¬ 
ever,  the  analyst  is  primarily  interested  in  using  the  estimating  equa¬ 
tion  to  predict  costs  among  the  population  of  items  that  the  sample 
represents;  the  standard  error  of  estimate  and  the  coefficient  of  de¬ 
termination  do  not  furnish  a  good  measure  of  th'e  reliability  of  the 
estimating  equation  for  predictive  purposes. 

The  problem  of  reliability  raises  other  considerations.  First, 
the  question  arises  whether  x  and  y  are  actually  related  in  the  manner 


unadjusted  case.  The  practical  implications  of  these  adjustments  is 
minimal  except  in  extremely  small  sample  cases.  However,  to  fully 
understand  the  results,  the  analyst  should  know  whether  the  total  var¬ 
iance,  standard  error,  and  correlation  coefficient  are  adjusted  in  any 
particular  program  or  set  of  results.  A  discussion  of  adjustments  for 
degrees  of  freedom  is  given  in  M.  J.  B.  Ezekiel  and  K.  A.  Fox,  Methods 
of  Covrelation  and  Regression  Analysis,  3d  ed.,  John  Wiley  &  Sons, 
Inc.,  New  York,  1959,  pp.  300-305. 
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indicated  by  the  regression  equation.  A  particular  sample  could  show 
such  a  relationship  out  of  pure  chance  when,  in  fact,  none  exists.  Sec¬ 
ond,  the  regression  equation  obtained  from  the  sample  is  one  of  a  family 
that  could  be  obtained  from  different  samples  within  the  same  popula¬ 
tion.  Finally,  when  the  equation  is  used  to  estimate  a  value  for  y 
based  on  an  x  that  is  outside  the  range  of  the  sample,  the  reliability 
of  the  estimate  of  y  may  be  suspect  because  the  estimated  relationship 
nay  not  hold  beyond  the  sample  range  or  because  the  re  is  a  point  from 
a  different  population  rather  than  an  extrapolation  from  the  sample. 

An  example  of  an  extrapolation  for  which  the  relationship  might  not 
hold  is  that  of  an  aircraft  that  is  much  larger  than  any  in  the  sample, 
llie  problem  of  moving  to  a  new  population  appears  in  a  case  in  which 
an  aircraft  is  to  be  constructed  of  titanium  when  the  sample  contains 
only  aluminum  aircraft.  In  the  latter  case,  if  a  substitution  of  tita¬ 
nium  for  aluminum  is  expected  to  increase  the  cost,  the  estimating  rela¬ 
tionship  developed  from  the  aluminum  sample  may  be  used  by  an  experi¬ 
enced  analyst  as  an  approximate  indicator  of  the  lower  bound;  however, 
adjustments  based  on  such  personal  judgments  are  not  a  part  of  statisti¬ 
cal  theory. 

Statistical  inference  may  be  used  to  ansvrer  the  two  questions  that 
arise  in  connection  with  the  problem  of  reliability.  To  decide  whether 
X  and  y  are  actually  related,  test  for  statistical  significance;  to 
evaluate  predictions,  establish  a  prediction  interval  for  the  regres¬ 
sion  line.  However,  certain  assumptions  and  conditions  must  be  met 
before  standard  techniques  of  statistical  inference  and  testing  can  be 
validly  applied  to  least-squares  results;  namely,  the  data  are  assumed 
to  be  a  sample  taken  from  a  larger  population,  which  meet  the  following 
conditions: 

1.  The  X  values  are  nonrandom  (fixed)  variables. 

2.  The  residual  deviations -are  independent  random  variables 
with  normal  distributions. 

3.  The  expected  value  of  the  distribution  of  each  of  these 
random  variables  is  zero,  and  the  unknown  variance  is  the 
same  fot  all  values  of  x. 
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Under  these  assumptions,  the  hypothesized  relationship  between  y  and  x 
becomes 


y  .■=  a  +  bx .+  u.. 


(12) 


where  i  =  (1, 

=  the  normally  distributed  random  error  terms  with  zero 
expected  value  and  a  common  and  unknown  variance. 

Further,  under  these  acsumptions,  the  least-squares  method  produces  un¬ 
biased  maximum  likelihood  estimators.  Standard  statistical  techniques 
can  be  applied  to  the  least-squares  results  to  test  for  significance 

and  to  make  Inferences  about  reliability  and  accuracy  in  a  probabilis- 
* 

tic  sense.  A  graphic  Illustration  of  these  assumptions  as  they  relate 
to  the  simple  (two-variable)  regression  case  is  shown  in  Fig.  7. 

Although  the  subject  of  statistical  testing  is  too  complex  to 
treat  comprehensively  here,  the  method  of  testing  the  significance  of 
the  relationship  between  x  and  y  in  the  simple  regression  of  Fig.  6 
will  be  examined  briefly.  Basically,  tho  procedure  involves  establish¬ 
ing  the  null  hypothesis  that  x  and  y  are  not  related  (i.e.,  that  &  =  0) 
and  testing  to  determine  whether  the  hypothesis  should  be  rejected. 

The  test  that  is  commonly  used  for  this  purpose  is  known  as  the  t-test 
because  it  uses  the  t-ratio,  or  ratio  of  a  coefficient  to  its  standard 
error.  For  this  simple  regression,  the  ratio  is  expressed  as 


where  b  *  the  estimated  regression  coefficient  (from  the  equation 

o  " 

y  =  a  +  bx)y 


A  more  comprehensive  statement  of  these  assumptions  and  considera 
tions  is  given  in  W.  A.  Spurr  and  C.  P.  Bonlnl,  Statistical  Analysis 
for  Business  DeaisionSt  Richard  D.  Irwin,  Inc.,  Homewood,  Illinois, 
1967,  pp.  564-565;  A.  M.  Mood,  Introduction  to  the  Theory  of  Statistics 
McGraw-Hill  Book  Company,  Inc.,  New  York,  1950,  pp.  152-154;  and  John 
Johnson,  Econometric  Methods,  McGraw-Klll  Book  Company,  Inc.,  New  York, 
1963,  pp.  3-9, 
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-  the  standard  error  of 


SE  =  the  standard  error  of  estimate  as  defined  in  Eq.  (9). 

The  value  of  for  Eq.  (4)  is  1.96. 

A  standard  table. of  t-ratios  is  required  to  use  Eq.  (13)  to  test 

■k 

the  null  hypothesis.  The  relevant  row  is  shown  in  Table  3.  If  the 
calculated  value  falls  below  the  appropriate  value  of  t  selected 
from  this  table,  the  null  hypothesis  that  b  =  0  would  be  accepted,  and 
it  would  be  concluded  that  b  is,  in  fact,  not  significantly  different 
from  zero.  The  level  of  significance  above  each  of  the  f-value's  in¬ 
dicates  the  probability  that  the  calculated  value  could  be  as.  high 
strictly  by  chance  as  the  values  that  are  shown  in  the  table.  In  other 
words,  these  levels  of  significance  indicate  the  probability  that  the 
null  hypothesis  will  be  rejected  when  it  is  true. 


Table  3 

VALUES  OF  t-RATIOS  FOR  8  DEGREES  OF  FREEDOM 
(One-sided  Test) 


1  . 

1  Level  of  Significance  ! 

_  ! 

Degrees  of 
Freedom 

.?,0 

.15 

• 

O 

.05 

.025 

j 

.01  1 

• 

• 

t-Ratio 

• 

8 

r 

1  .889 

1.108 

1 

1.397 

1.860 

2.306 

2.896 

If  there  were  evidence  to  justify  the  assunition  that  the  sign  of 
the  coefficient  could  be  only  positive  (or  only  negative)  if  it  were 
different  from  zero,  the  level  of  significance  associated  with  each  t 
could  be  read  directly  from  Table  3.  However,  the  common  practice  in 


All  of  the  references  in  the  Bibliography  to  this  section  contain 
t-tables . 
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Fig.  7 — Simple  linear  population  regresnion  model 


regression  analysis  is  not  to  make  this  assumption,  but  to  test  as 
though  the  value  of  t  (if  it  were  different  from  zero)  could  be  either 
positive  or  negative.  Decause  of  the  symmetry  of  the  distribution  of 
the  t-ratlos,  the  level  of  significance  for  the  two-sided  test  is  twice 
tha  level  of  significance  for  the  one-sided  test.  Thus,  the  levels  of 
significance  of  the  i-values  shown  in  the  table  are  only  half  the  actual 
levels  for  the  ttvo-slded  test.  For  example,  the  value  1.86  has  a  level 
of  significap'ie  of  .05.  For  the  two-sided  test,  double  this  amount  and 
read  the  level  of  significance  as  .10.  In  the  two-sided  test,  the 
probability  is  10  percent  that  the  absolute  value  of  is  as  large  as 
1.86  when  b  is  actually  equal  to  7ern>,  Since  In  the  example  *  1.96, 
if  the  required  level  of  probability  for  rejecting  the  null  hypothesis 
when  it  is  true  is  as  high  as  10  percent  but  no  higher,  the  hypothesis 
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that  b  =  0  is  rejected,  and  the  relationship  is  considered  significant. 
On  the  other  hand,  if  a  .05  lei/el  of  significance  (r  =  2.308)  seems 
appropriate,  the  hypothesis  must  be  accepted.  In  this  case,  the  co¬ 
efficient  of  X,  and  therei;ore  the  equation,  is  considered  as  not 
* 

significant. 

The  question  at  this  point  is,  Wl^at  should  the  level  of  signif¬ 
icance  be  for  rejecting  the  hypothesis?  Unfortunately,  no  simple  an¬ 
swer  is  possible.  The  values  of  .10,  .05,  and  .01  are  those  that  are 
most  commonly  used,  but  the  analyst  must  make  a  decision  based  on  the 
risk  that  is  assumed  when  a  true  hypothesis  is  rejected.  For  the 
purpose  of  this  discussiou,  we  will  accept  a  value  of  .10  in  testing 
significance  and  In  establishing  a  prediction  Interval  for  the  regres¬ 
sion  line. 

Prediction  Intervals 

The  procedure  for  calculating  the  prediction  interval  for  a  simple 
regression  is  as  follows.  For  a  given  value  of  the  explanatory  var¬ 
iable,  say  X,  the  estimating  equation  is  used  to  obtain  a  predicted 
value  of  the  dependent  variable: 

y  =  a  +  bx.  (14) 

The  prediction  interval  puts  a  boundary  around  y; 

y  -  ^c/2*  (15) 

There  is  a  certain  level  of  confidence  (1  -  e)  that  the  cost  of  a  set 
weighing  x  will  be  in  that  interval. 

*A  more  comprehensive  discussion  of  the  use  of  statistical  tests 
is  given  in  W.  A.  Wallis  and  H.  V.  Roberts,  Statistics y  The  Free  Press, 
Hew  York,  1963,  pp,  399-402,  413-426. 

For  further  discussion,  see  W.  A.  Spurr,  L.  S.  Kellogg,  and 
J.  H.  Smith,  Business  and  Economic  StatistieSy  Richard  D.  Irwin,  Inc., 
Homewood,  Illinois,  1961,  pp.  251-255. 
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Values  for  zfZ  rather  than  e  are  used  since  y  is  to  be  bounded  on 
both  sides.  The  values  of  e  can  be  divided  by  two  since  under  the  as¬ 
sumptions,  the  probability  distribution -about  y  is  normal  and  therefore 
is  symmetrical.  In  statistical  terminology,  a  two-tailed  t  distribu¬ 
tion  for  constructing  the  Intervals  is  used. 

In  the  case  of  simple  regression,  a  100(1  -  e)-perceut  prediction 
interval  for  an  estimated  value  of  the  dependent  variable  can  be  con¬ 
structed  as  follows: 


S'  *  ^r./2' 


(16) 


where 


Z  {x. 


(17) 


and  where  SE  =  the  standard  error  of  the  estimating  equation  from  which 
y  was  obtained, 

value  obtained  from  a  table  of  t-values  for  the  e/2 
significance  level, 
n  *  the  size  of  the  sample, 

X  =  the  specified  value  of  the  explanatory  variable  used  as 

A 

a  basis  for  obtaining 

X  “  the  mean  oi;  the  x's  in  the  sample, 

—  2 

I  (x.  -  x)  -  the  sum  of  the  squared  deviations  of  the  sample  x’c  from 

'Z' 

their  sample  mean. 

When  the  estimating  equation  derived  previously  is  used,  the  cost 
of  a  communications  set  weighing  100  lb  is  estimated  at  $10,777.  To 
establish  around  this  value  a  90-percent  prediction  Interval  (i.e. ,■ 
one  with  a  10-percent  level  of  significance),  the  necessary  data  are 


SE  -  5.808, 
e  «  0.1, 
e/2  «  0.05, 
t  »  1.86, 
n  “  10, 
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df  =  8, 

X  =  100  lb, 

X  =  79.2  lb, 

I  (x  -  x)^  =  18,813.6  lb. 

By  substituting  these  data  in  Eq.  (17),  solving  for  and  mul¬ 

tiplying  by  1000,  we  obtain 

Therefore,  for  x  “  100  lb,  the  90-percent  prediction  intervals  in  dol¬ 
lars  are 


y  ±  dg^2  “  i  $11,4A7. 

The  percentage  100(1  -  e)  is  the.  confidence  level  of  the  prediction 
intervals,  which  means  that  if  repeated  observations  on  the  cost  of 
communications  sets  that  weigh  100  lb  were  taken,  100(1  -  e)  percent 
of  the  time  these  observations  would  lie  withiii  the  range  set  by  the 
100(1  -  e)  predlctJ.on  intervals.  This  is  the  only  sense  in  which  a 
lavel  of  confidence  can  be  associated  with  prediction  intervals.  It 
is  errcneoxis  to  infer  that  there  is  a  100(1  -  e) -percent  probability 
that  the  actual  value  for  any  particular  case  will  lie  within  the  in¬ 
terval. 

Further,  prediction  Intervals  are  valid  outside  the  range  encom¬ 
passed  by  the  sample  data.that  are  used  to  generate  the  estimating  re¬ 
lationship  and  the  interval  only  if  the  estimating  relationship  is  it¬ 
self  valid  outside  that  rarge.  For  example,  if  there  we;'e  occasion 
for  the  line  to  curve  up  or  down  or  if  a  discontinuity  in  the  form  of 
a  discrete  jump  in  cost  occurred  for  weights  outside  the  sample  range, 
this  fact  would  not  be  reflected  in  the  prediction  Interval.  Thus,  it 
must  be  clearly  indicated  when  the  intervals  are  used  for  estimates 
based  on  values  outside  the  sample  range. 

This  prediction  Interval  procedure  can  be  repeated  for  other  val¬ 
ues  of  X  and  the  results  plotted  to  obtain  a  90-percent  prediction 
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interval  .band  around  the  regression  line,  as  shown  in  Fig.  8.  In  this 
case,  the  90-percent  confidence  region  is  fairly  wide  because  of  the 
relatively  large  standard  error  of  this  equation.  The  formula  for  the 
prediction  interval  is  such  that  the  >Tidth  of  the  interval  is  sensitive 
to  the  size  of  the  standard  error;  large  standard  errors  indicate  that 
much  of  the  cost  variation  in  the  observcid  data  is  unexplained  by  the 
equation. 


n 

u 

9 

O 

A 

u 

<0- 


00 

o 

c. 


Weight  (lb) 


Fig.  8— The  90-peraent  prediction  interval  hand 
for  estimated  costs  hosed  on  sample  data 


The  prediction  interval  becomes  wider  as  values  of  x  that  are  far¬ 
ther  from  the  mean  of  the  sample  are  selected.  From  Eq.  (4),  the  pre¬ 
diction  interval  (multiplied  by  1000)  for  the  mean  79.2  lb  is  $9,051 
±  $11,329;  for  x  =  200  lb,  the  prediction  interval  is  $19,077  1  $14,794. 
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In  the  latter  case,  the  width  of  .the  interval  is  about  1.3  times  the 
width  for  the  mean  weight.  This  change  in  the  size  of  the  prediction 
'interval  occurs  because  the  formulas  are  derived  to  allow  for  the  pos- 

A 

sibiiity  -that  the  estimated  values  of  a  and  h  differ  from  the  true  val¬ 
ues  of  a  and  b.  Such  a  situation  can  occur  when  the  sample  data  con¬ 
tain  chance  fluctuations  that  prevent  the  data  from  reflecting  the  true 

r 

relationship  that  exists  in  the  total  population  or  when  there  are  not 
sufficient  data  in  the  sample. 

Figure  9  illustrates  the  way  in  which  errors  in  the  estimates  of 

^  A 

a  and  b  affect  the  accuracy  of  estimates.  The  solid  line  represents 
the  true  .relation  between  x  and  y.  The  dashed  line  represents  an  equa- 

^  A 

tion  in  which  the  estimated  values  of  a  and  b  differ  from  the  true 


I 


Fig,  9 — Effects  of  estimating  errors  on  aeouraoy 
of  predictions 
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values.  The  figure  shows  that  the  effect  of  these  errors  Increases 
with  movement  toward  the  e>'i:reme  ranges  of  x. 

The  width  of  the  prediction  interval  is  also  sensitive  to  the 
level  of  confidence  that  is  specified^  and  to  the  number  of  degrees  of 
freedom.  That  Iwel  was  set  at  90  percent  (i-.e.,  e/2  =  0.05).  Suppose 
that  only  a  70-percent  level  of  confidence  is  required  (e/2  *  0.15). 

The  only  change  in  the  Inputs  used  in  the  previous  calculations  is  the 
value  of  t.  With  a  90-percent  level  of  confidence;  *  05  "  I'SbJ  with 
a  70-percent  level,  t  =  1.11.  This  change  will  make  a  difference 
in  the  width  of  the  prediction  Interval.  Since  the  level  of  confidence 
is  lower,  the  prediction  interval  is  narrower;  for  lower  levels  of  con¬ 
fidence,  the  band  will  be  even  more  narrow.  For  e  *  .10  and  the  degrees 
of  freedom  =  8,  the  value  of  1*86.  If  the  degrees  of  freedom 

were  16,  ^^^2  i.746.  Thus,  if  there  are  twice  as  many  degrees 

of  freedom  for  an  equation  with  the  same  standard  error,,  the  prediction 
interval  for  £  =  .10  is  smaller.  However,  the  difference  in  prediction 
interval  size  because  of  differences  in  degrees  of  freedom  is  more  sig¬ 
nificant  for  small  samples  than  for  large  samples;  the  value  of  t  for. 
any  given  level  of  significance  becomes  almost  constant  for  degrees  of 
freedom  over  30.  For  example,  the  smallest  value  of  <’^^2  e  *  .10 
is  1.6A5. 

Before  concluding  this  section,  there  are  two  additional  points 

2 

to  be  made.  First,  even  when  the  coefficient’ of  determination  v  is 
high,  it  is  possible  for  the  standard  error  of  estimate  to  be  large. 

This  is  explained  by  the  fact  that  r"  is  based  on  a  proportion  and  the 
standard  error  is  based  on  an  absolute  quantity: 

2  Explained  variance 
^  *  Total  variance  ’ 

SE  *  ♦'Unexplained  variance. 

Thvis,  even  if  the  explained  variance  represents  a  high  fraction  of  the 
total  variance,  it  is  possible  for  the  unexplained  variance  to  be  large 
relatl.ve  to  the  estimated  cost.  This  outcome  would  be  indicated  by  the 
coefficient  of  variation. 
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Second,  the  statistical  significance  of  regression  relationships 

does  not  necessarily  Imply  existence  of  a  causal  relationship.  The 

\ 

following  excerpt  from  an  Institute  of  Defense  Analyses  (IDA)  meaoran- 
dum  illustrates  the  importance  of  this  distinction  in  cost  analysis: 

Frequently  during,  cost  effectiveness  studies,  the  dis- 
tinctioi.  between  a  "causation"  cost  model  and  a  "correlation" 
cost  model  is  overlooked.  A  simple  example  will  be  used  to 
illustrate  the  distinction  between, the  two  types  of  cost 
models  ?ind  showr  how  a  sensitivity  analysis  performed  with  a 
correlation  cost  model,  rather  than  a  causation  model,  can 
lead  to  erroneous  conclusions. 

Example:  Estimate  the  cost  of  assembling  a  piece  of 
hardware.  The  assembly  consists  merely  of  bolting  various 
elements  together  The  overwhelming  majority  of  the  cost 
of  the  assembly  process  is  the  salary  paid  to  the  men  who 
do  the  bolting.  Careful  analysis  of  all  the  available  cost 
data  might  yield  a  correlation  cost  model  given  by  Equation 
1. 


0  -  a  ^  w 


(1) 


where  w  is  the  total  weight  of  all  the  bolts  that  go  into 
the  asi'sembly, 

C  is  the  cost  of  the  assembly, 
a  is  a  regression  coefficient. 

By  all  or  the  various  statistical  measures  of  goodness 
of  fit.  Model  1  is  a  valid  prediction  equation. 

The  causation  cost  model  is  given  by  Equation  2. 

C  *  7<  X  h  X  n  (2) 

where  k  is  the  hourly  wages  of  the  assemblers, 

h  is  the  number  .of  hours  it  takes  to  fasten  and  bolt, 
n  is  the  number  of  bolts  used  in  the  final  assembly, 

C  is  the  cost  of  the  assembly. 

It  should  be  noted  that  the  correlation  cost  model  and 
the  causation  cost  model  are  interrelated  by  Equation  3. 


y  *  F  X 


(3) 


Morris  Zusman,  "Use  of  Cost  Models  in  Sensitivity  Analysis  and 
as  a  Design  Aid,"  Institute  of  Defense  Analyses,  N-587(R),  September 
1?68.  In  this  discussion,  the  term  aovrelation  is  used  figuratively 
rln  the  sense  that  it  is  statistically  significant  in  explaining  the 
acfiount  of  variance  rather  than  in  the  sense  that  both  the.  dependent 
and  independent  variables  are  random. 
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where  B  is  the  weight  of  a  single  bolt, 

W  is  the  total  weight  of  all  of  the  bolts  that  go  into 
tne  assembly. 

Thus  any  design  or  sensitivity  analysis  performed  on 
Equation  1,  the  correlation  cost  model,  will  lead  to  the 
correct  results  if  Equation  3  is  not  violated.  For  example, 
an  analyst  would  be  correct  in  predicting  that  a  cost  reduc¬ 
tion  would  occur  if  he  reduced  the  weight  of  the  fasteners 
used  by  using  less  fasteners.  He'-Would  be  Incorrect  if  he 
predicted  a  cost  reduction  would  occur  if  he  reduced  the 
weight  of  the  fasteners  by  substituting  aluminum  for  steel 
bolts  while  keeping  the  number  of  bolts  constant.  The  rea¬ 
son  that  a  substitution  of  aluminum  for  steel  bolts  would 
not  reduce  the  cost,  is  because  the  underlying  relationship 
between  the  number  of  bolts  and  the  weight  or  the  fasteners 
(Equation  3) ,  which  is  the  reason  for  the  good  cost  weight 
relationship  of  the  correlation  model,  has  been  violated. 

In  mathematical  terms  both  a  causation  and  a  correla¬ 
tion  cost  model  have  the  following  properties. 


Cost  »  /  (characteristics)  (4) 

But  only  a  causation  model  can  be  manipulated  as  Equa- 
' tlon  5, 

Characteristics  =  /  ^  (cost)  (5) 


The  problem  of  determining  whether  a  cost  model  is  a 
correlation  or  a  causation  njpdel  is,  except  for  the  trivi¬ 
ally  simple  type  of  probleni’ illustrated  here,  very  difficult 
since  all  causation  models  can  be  transformed  into  correla¬ 
tion  models.  There  exist  no  statistical  tests  to  determine 
whether  a  model  is  a  causation  model  or  a  correlation  model. 

The  types  of  explanatory  variables  used  in  the  cost 
model  generally  will  give  a  good  guide  as  to  whether  a  model 
is  a  correlation  model  or  a  causation  model.  For  example, 
weight  as  an  explanatory  variable  in  a  cost  nodel  where  the 
material  cost  did  not  dominate,  would  be  a  good  indication 
that  the  cost  model  was  a  correlation  model. 

If  the  model  is  a  correlation  model  and  the  analyst  per¬ 
forms  a  ssnsitlvity  analysis,  he  runs  the  risk  of  violating 
the  unknown  underlying  relationships  between  the  correlation 
and  causation  models.  If  these  undeirlying  relationships  are 
violated  the  sensitivity  analysis  will  be  erroneous. 

This  example  Illustrates  that  regression  analysis  is  an  aid  to,  and  not 
a  substitute  for,  experience  and  understanding. 
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Curvilinear  Analysis 

Until  this  point  the  analysis  has  been  confined  to  a  simple  (one 
explanatory  variable)  linear  regression.  Although  a  cursory  examina¬ 
tion  of  the  scatter  diagram  of  cost  versus  weight  illustrated  in  Fig. 

1  indicates  that  a  linear  relationship  may  be  adequate,  it  cannot  be 
concluded  definitely  that  a  curviiiriear  relationship  might  not  be 
preferable.  These  relationships  can  be  examined  by  transforming  the 
data  to  permit  the  relationships  to  be  estimated  using  linear  esti¬ 
mating  techniques.  The  equation 

y  =  a  bx^  (18) 

2 

can  be  estimated  using  the  least-squares  method  by  substituting  x  for 
each  X  and  solving  the  normal  equations  as  before. 

Another  type  of  nonlinear  relationship  that  is  frequently  used  and 
that  will  be  examined  in  discussing  cost-quantity  relationships  in  Sec. 
V  ic  of  the  fom 

y  =  ox^.  (19) 

For  this  form,  a  logarithmic  transformation  of  both  variables  is  made 
to  obtalvi  an  equation  that  is  linear  in  the  logarithms  of  the  original 
variables : 

log  y  =  log  a  +  &(log  x) .  (20) 


The  regression  analysis  is  then  conducted  in  terms  of  the  logarithms 
of  the  variables  rather  than  in  terms  of  the  variables  themselves. 

1c 

(Throughout  this  section,  logarithms  to  the  base  10  will  be  used.  ) 


It  ie  possible  to  estimate  relationships  such  as  those  represented 
by  Eq.  (19)  directly.  For  example,  see  C.  A.  Graver  and  H.  E.  Boren, 
Jr.,  Multivariate  Logarithnio  an<f  Exponential  Regreseion  Models ^  The 
Rand  Corporation,  RM-4879-PR,  July  1967.  Although  direct  nonlinear  es¬ 
timating  techniques  have  some  desirable  properties,  they  are  much  less 
widely  used  in  cost  analysis  than  the  linear  methods. 
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However,  to  permit  the  standard  techniques  of  statistical  inference 

based  on  linear  least-squares  regression  to  be  used,  it  is  assumed  that 

the  dependent  variable  log  y .  is  linearly  related  to  the  independent 

variable  log  x.  and  to  the  normally  distributed  random  variable  u.  by 
^  *1/ 

the  equation 


log  y.  =  log  a  b  log  x.  +  u.  (i  =  1,  ....  n).  (21) 

When  antilogarithms  are  used,  Eq.  (21)  is  implicitly  of  the  form 


■u  W* 

y  =  ax. 10  . 

^  1 


Because  of  this  difference  in  form,  statistics  derived  for  Eq.  (22)  are 
not  directly  comparable  with  those  derived  for  Eq.  (12).  Similarly, 
statistics  on  predictions  made  by  the  two  models  will  not  be  easily 
comparable  because  in  the  one  case  errcr  is  additive  and  in  the  loga^ 
rithmic  case  error  is  exponential  and  multiplicative. 

The  first  step  in  estimating  the  coefficients  for  Eq.  (20)  is  to 
convert  to  Icgarithms  the  data  for  cost  (in  thousands  of  dollars)  and 
for  weight  shown  in  Table  1.  The  next  step  is  to  calculate  the  least- 
squares  estimates  of  b  and  log  a.  The  results  of  these  calculations  are 


log  y  =  -1.0425  +  1.0241(log  x) , 

=  .560, 

SE,  =  .2763,  (23) 

log 

=  3.19, 
df  =  8. 


The  antilogarithms  of  both  sides  of  Eq.  (23)  give 

y  =  (.09067)x^*°^''^‘^,  (24) 


where  y  =  cost  in  thousands  of  dollars, 
X  =  weight  in  pounds. 
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Based  on  the  coefficient  of  determination  (r  )  and  the  calculated 
f:-value  ,  these  results  appear  to  be  slightly  better  than  those 
obtained  with  the  linear  case.  However,  care  must  be  exercised  in  com¬ 
paring  the  logarithmic  with  the  linear  form  and  i.ii  evaluating  the  log¬ 
arithmic  form  itself.  There  are  significant  differences  between  the 
two  forms.  A  hint  of  these  differences  is  given  by  the  fact  that  the 
standard  error  for  the  logaritVimic  case  is  the  standard  error 

of  the  logarithms;  of  the  original  numbers  and  not  the  standard  error 
of  the  numbers  themselves.  For  this  reason,  the  standard  error  for 

the  logarithmic  case  (SE,  =  .2763)  is  about  20  times  smaller  than  the 

log 

standard  error  for  the  aritlimetic  or  linear  case  (SE  =  5.808).  Thus, 
the  relative  sizes  of  these,  standard  errors  do  not  give  a  direct  in¬ 
dication  of  the  equation  that  has  the  smaller  standard  error  in  terms 
of  the  original  numbers,  which  are  the  numbers  of  interest  in  cost 
analysis . 

A  review  of  the  manner  in  which  least-squares  estimators  are  cal¬ 
culated  will  help  to  clarify  this  difference  and  to  explain  how  these 
results  can  be  compared.  The  technique  is  to  find  a  and  h  such  that 

,1  -  hi)  (25) 

is  minimized.  In  the  logarithms  of  the  numbers,  however,  this  is  equiv¬ 
alent  to  finding  the  minimum  value  of 


since  (log  y.  -  log  y .)  =  log  y>ly>-  Thus,  by  transforming  the  vari- 
ables  tc  logarithms,  the  sum  of  the  squares  of  the  logarithms  of  the 
ratios  rather  than  the  sum  of  the  squares  of  th.?.  differences  between 
the  ob>:<ervc;d  and  actual  values  of  y  are  minimized. 

'The  .full  impact  of  this  change  can  be  best  illustrated  by  an  exam¬ 
ination  of  the  way  in  which  the  difference  affects  the  calculation  of 
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prediction  intervals.  To  obtain  prediction  intervals  for  cost  estimates 
when  a  logarithmic  equation  is  used,  the  internals  are  first  calculated 
directly  with  the  logarithmic  data  and  they  are  then  converted  to  nat¬ 
ural  numbers..  Thus,  the  end  points  of  the  interval  in  logarithmic  form 
are 


log  y  -  and  log  y  +  A^j^, 


(27) 


where 


a/  E  (log  -  log  X) 

For  the  case  where  x  =  x,  these  end  points  become 

log  y  -  (.2763)i^^^(i.049)  and  log  y  +  (.2763)t^^2(l-0^9) .  (23) 

When  antilogarithms  of  these  numbers  are  used,  the  following  prediction 
interval  end  points  for  the  e  level  of  significance  are  obtained: 


2898*  2898* 

(y)10  e/2  and  (y)10  e/2, 


(29) 


which  are  equivalent  to 


10*2898*^/2 


,  2898* 

and  (^')10  c/2, 


(30) 


These  results  show  that  the  prediction  interval  band  for  the  original 
numbers ,  based  on  a  logai  Ithmic  regression  analysis ,  is  both,  noasym- 
metrical  and  proportional  to  the  predicted  values.  Further,  the  stand¬ 
ard  error  for  the  logarithmic  case  is  more  comparable  with  the 

coefficient  of  variation  (OV)  for  the  arithmetic  case  than  it  is  with 
the  standard  error  (GE)  for  the  arithmetic  case,  because  the  standard 
error  .for  the  logarithmic  t.Cde  (like  the  coefficient  of  variation  for 
the  linear  case)  is  a  proportion. 
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The  band  for  the  standard  error  is  delineated  by  the  following 
locus  of  points  for  the  various  values  of  yi 

- and  (t/)10*^^^^  (31) 

Thus,  the  upper  and  lower  bounds  of  the  standard  error  band  at  the  sam¬ 
ple  mean  value  of  y  (9.06)  based  on  the  logarithmic  regression  analysis 
is  given  by  the  following  numbers: 


9.06 


10. 


2763 


and 


(9.06)10'^^^^, 


which  equals  4.80  and  17.12,  respectively.  When  these  numbers  are  ex¬ 
pressed  as  differences  around  the  mean,  8.06  is  obtained  for  the  upper 
half  of  the  interval  and  4.26  for  the  lovrer  half. 

Figure  10  shows  a  graph  of  the  values  of  the  standard  error  for 
other  values  of  y  and  the  band  for  the  90-pereent  prediction  intervals 
plotted  above  and  below  the  regression  line.  These  bands  about  the 
regression  line  illustrate  both  the  nonsymmetry  and  the  proportionality 
of  these  measures  for  the  logarithmic  case:  nonsymmetry  in  that  the 
distance  between  the  regression  line  and  the  upper  bounds  is  greater 
than  that  for  the  lower  bounds;  and  proportionality  in  that  the  bounds 
become  wider  as  y  becomes  larger.  Because  the  standard  error  for  the 
logarithmic  case  is  a  constant  percentage  of  y,  the  absolute  value  of 
the  bounds  change  as  the  value  of  y  changes. 

In  Fig.  11,  an  interval  of  plus  and  minus  $5,808  (the  amount  of 
the  standard  error  in  the  arithmetic  case)  and  the  standard  error  as 
shown  in  Fig.  10  have  been  plotted  about  the  regression  ''■‘.ne  that  was 
obtained  with  the  logarithmic  transformation.  Figure  3  illustrates 
the  way  in  which  the  standard  error  based  on  the  logarithmic  regression 
analysis  compares  with  the  results  that  were  obta:ined  from  the  arith¬ 
metic  equation.  The  interval  of  plus  $5,808  Intersects  the  upper  bound 
of  the  standard  error  at  the  point  where  a  *  65  lb .  The  interval  of 
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Fig.  lO—LogcrithMa  equation  with  etcmdard  error  and  90-peraent 

prediction  intervale 


Cost  ($  thousands) 
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Klnus  $5,808  Intersects  the  lower  bound  at  aJ  *  121  lb.  Thus,  f 
estlneted  values  of  gtestar  than  $12,300.  the  interval  based  on  the 
value  of  the  standard  error  of  the  arlthnetlc  ease  is  less  than  the 
lower  bound  of  the  standard  error  calculated  from  the 
analysis.  Siisllarly,  for  all  estimated  values  of  y  greater  an  . 
this  interval  is  leas  than  the  upper  bound  (logarithmic  case). 

On  the  basis  of  these  considerations,  it  can  be  seen  that  the  com¬ 
parisons  of  the  logarithmic  results  and  the  arithmetic  results  are  dif¬ 
ficult  and  can  often  be  misleading.  Higher  coefficients  of  determina¬ 
tion  lor  the  logarithmic  case  do  not  necessarily  imply  that  this  case 
is  better  from  the  viewpoint  of  erplaining  cost  variance  in  the  orig¬ 
inal  numbers.  Cmaparlso,.s  of  the  standard  errors  for  these  two  cases 
1,  usually  not  possible  without  n  full  examination  of  the  differences 
as  illustvatr.d  in  Figs,  10  and  11. 


ndard  er 


(upper  bound) 
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However,  on  the  positive  side,  some  relationships  are,  in  fact, 
nonlinear,  and  logarithmic  transformations  provide  a  practical  means 
for  estimating  nonlinear  exponential  relationships  with  linear  estimat¬ 
ing  techniques.  Although  there  are  techniques  for  estimating  exponen¬ 
tial  forms  directly  with  nonlinear  estimating  techniques,*  there  are 
also  some  difficulties  in  comparing  and  evaluating  these  results ,  Be¬ 
cause  rhe  direct  estimating  techniques  for  exponential  forms  are  non¬ 
linear,  they  do  not  possess  all  the  properties  that  are  required  to 
permit  the  direct  application  of  standard  regression  analysis. 

Another  useful  application  of  logarithmic  regression  analysis 
arises  in  cases  in  which  empirical  evidence  or  experience  indicates 
that  the  assumption  of  proportional  variance,  rather  than  constant  var¬ 
iance,  seems  more  appropriate.  Frequently,  a  simple  scatter  diagram 
such  as  that  shown  in  Fig,  6  is  sufficient  to  indicate  whether  propor¬ 
tional  or  constant  variance  is  more  appropriate.  Alternatively,  the 
sample  could  be  divided  into  two  or  more  groups,  and  tests  could  be 
performed  on  the  means  of  the  absolute  values  of  the  residuals  in  the 
linear  case  in  each  group.  If  the  higher  values  of  the  dependent  vari¬ 
ables  have  residuals  that  are  greater  in  value,  the  assumption  of  pro¬ 
portional  variance  would  be  indicated.  The  use  o>:  a  logarithmic  trans¬ 
formation  is  a  convenient  way  to  transform  the  data  to  conform  to  the 
requirement  of  proportional  variance.  If  constant  variance  is  assumed 
in  the  logarithms  of  the  numbers,  standard  regression  analysis  can  be 
performed  in  the  logarithms.  However,  the  assusEption  of  constant  var¬ 
iance  in  the  logarithms  Implies  proportional  variance  in  the  original 
numbers . 


Multiple  Regression  Analysis 


To  this  point,  simple  (one  explanatory  variable)  regression  anal¬ 
ysis  has  been  used  to  examine  both  the  linear  and  the  nonlinear  rela¬ 
tionship  between  cost  and  weight.  With  the  array  of  data  shown  in 

*See,  for  example.  Graver  and  Boren. 
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Table  1  and  the  logarithmic  transformations  of  these  data,  multiple 
(more  than  one  explanatory  variable)  regression  analysis  will  now  be 
examined.  This  section  covers  the  multiple  linear  and  the  multiple 
nonlinear  (exponential)  case;  for  the  latter,  logarithmic  transforma¬ 
tions  will  be  used.  Because  the  sample  documented  in  Table  1  contains 
only  ten  observations,  the  examination  will  be  limited  to  various  com¬ 
binations  of  two  rather  than  three  explanatory  variables.  If  additional 
observations  were  included  in  the  sample,  three  explanatory  variables 
might  be  considered  under  certain  circumstances;  however,  this  number 
of  variables  used  with  ten  observations  would  detract  from  the  credi¬ 
bility  of  the  results.  In  any  event,  there  is  no  great  loss  dn  limit¬ 
ing  the  number  of  variables  to  two;  the  essential  differences  between 
simple  and  multiple  regression  can  be  illustrated  with  the  two-explanatory 
variable  case. 

In  the  linear  case,  the  estimating  equation  is  of  general  form 

y  =  a  +  bx  +  os.  (32) 

The  results  for  each  of  the  possible  combinations  of  two  from  the  set 
of  three  explanatory  variables  are  as  follows; 


(7  =  -3.752  +  .104(17) 
(2.61) 

+  .018(F), 

(1.72) 

(33a) 

C  =  2.930  +  .074(<7) 
(1.12) 

+  ,0047 (P), 

(0.19) 

(33b) 

C  =  -0.526  +  .045 (P) 
(2.82) 

+  .027(F), 

(2.38) 

(33c) 

where  C  =  cost  in  thousands  of  dollars, 

W  =  wei^t  in  pounds, 

F  =  frequency  in  megahertz, 

P  =  power  in  watts. 

The  number  in  parentheses  below  each  of  the  estimated  coefficients 
is  the  value  of  the  t~ratios  for  each  of  these  coefficients.  However, 
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since  an  additional  variable  has  been  added,  the  degrees  of  freedom 
for  these  equations  is  7  vather  than  8,  as  it  was  for  the  simple  case. 
Thus,  the  appropriate  value  of  t  in  testing  the  null  hypothesis  for 
each  of  the  coefficients  is  1.895  rather  than  1.86C. 

To  understand  the  use  of  f:-ratios  in  multiple  regression  equations, 
the  meaning  of  the  multiple  regression  coefficients  must  be  understood. 
In  each  case,  the  multiple  regression  coefficients  shows  the  net  effect 
of  an  explanatory  variable.  For  example,  Eq.  (33a)  can  be  interpreted 
as  follows:  For  a  given  frequency,  a  1-lb  increase  in  weight  will  cause 
a  $104  Increase  in  cost.  Alternatively,  for  a  given  weight,  a  1-MHz 
change  will  cause  the  expected  cost  to  change  by  $18.  As  the  independ¬ 
ence  between  the  explanatory  variables  decreases,  the  validity  of  this 
interpretation  and  the  use  of  multiple  variables  diminish.  For  example, 
if  weight  and  frequency  are  related  in  such  a  way  that  a  change  in 
weight  cannot  be  assumed'  with  frequency  constant,  the  use  of  both  var¬ 
iables  in  a  single  multiple  regression  equation  can  produce  spurious 
results  (e.g.,  the  wrong  sign  on  a  coefficient,  such  as  a  negative  sign 
for  the  weight  coefficient) . 

Fortunately,  there  are  quantitative  indicators  that  are  useful  in 
evaluating  empirically  the  significance  of  such  interdependencies  on 
regrescion  results.  Allowance  for  interdependence  is  built  into  the 
formula  for  calculating  the  standard  error  of  each  coefficient  in  mul¬ 
tiple  regression  equations.  Thus,  the  t-ratios  in  a  multiple  regres¬ 
sion  not  only  serve  to  indicate  the  significance  (or  nonsignlficance) 
of  each  of  the  explanatory  variables  but  also  indicate  when  there  is 
an  unacceptably  strong  relationship  between  these  variables. 

From  Eq.  (33b),  it  can  be  seen  that  the  inclusion  of  power  with 
weight  causes  weight  to  become  nonsignificant  at  the  10-percent  level 
of  significance.  Weight  was,  however,  significant  at  this  level  Irr  ■ 
the  simple  regression  case.  The  coefficient  of  determination  between 
weight  and  power  is  .333,  which  indicates  that  over  SO  percent  of  the 
total  variance  in  weight  could  be  explained  by  a  regression  of  weight 
on  power.  Thus,  the  adverse  effect  on  the  significance  of  weight  that 
results  from  the  inclusion  of  power  can  be  attributed  to  the  existence 
of  Interdependence  between  these  two  variables. 
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As  the  degree  of  Interdependence  increases.,  regression  results  be- 
cdne  less  stable  and  laore  indeterminant.  As  a  consequence,  the  t-ratio 
should  not  be  the  sole  test  for  assessing  the  amount  of  interdependence 
present.  Further,  it  is  not  possible  to  give  a  precise  cutoff  point 
at  which  explanatory  variables  must  always  be  considered  too  inter¬ 
dependent.  A  coefficient  of  .9  or  more  will  almost  certainly  cause 
problems;  one  of  .3  or  less  usually  will  not.  The  array  of  correla¬ 
tions  and  coefficients  of  determination  among  the  explanatory  variables 
should  always  be  examined  in  the  early  stages  of  analysis,  and,  to  the 
extent  possible,  the  use  of  interdependent  explanatory  variables  should 
be  avoided. 

It  is  also  possible  for  variables  to  be  nonsignificant  in  multiple 
regression  equations,  even  when  there  is  no  high  level  of  interdepend¬ 
ence.  For  example,  in  Eq.  (33a)  the  coefficient  of  frequency  is  non- 
signlflcai'.t  at  the  10-percent  level  although  the  coefficient  of  deter¬ 
mination  between  frequency  and  weight  is  only  .091.  Frequency  in 
conjunction  with  weight  is  simply  not  a  useful  explanatory  variable. 
Regardless  of  the  reason,  nonsignificant  variables  should  not  ordinar¬ 
ily  be  retained  in  regression  equations  used  for  cost  estimating.  Only 
one  of  the  three  multiple  ref/iression  equations  shown  above  produces  an 
acceptable  result:  This  is  Eq.  (33c),  in  which  frequency  and  power  are 
used  as  explanatory  variables,  and  both  are  statistically  significant. 

The  question  arises.  For  cost-estimating  purposes,  is  the  multiple 
regression  with  povjer  and  frequency  preferable  to  the  simple  regression 
with  weight  as  the  explanatory  variable?  To  find  an  answer,  the  other 
measures  by  which  the  regression  equations  are  judged  must  be  compared: 
the  standard  error  of  estimate,  the  coefficient  of  variation,  and  the 
coefficient  of  determination.  These  are  shown  in  Table  4  for  each  of 
the  multiple  regressions  fot  comparison  with  the  results  obtained  from 


In  the' limiting  case  of  the  explanatory  variable  regressions 
in  which  one  variable  is  an  exact  linear  'function  of  another,  the  re¬ 
gression  results  become  completely  indeterminant  since  the  attempt  is 
then, to  fit  a  plane  in  two  dimensions,  and  there  are  an  infinite  num¬ 
ber  of  planes  intersecting  each  line  in  the  two-dimensional  space. 

An  excellent  discussion  of  this  point  is  found  in  John  Jolinson,  Econ- 
oaetz^a  Methods ,  McGraw-Hill  Book  Company,  Inc.,  New  York,  1963,  pp. 
201-207, 
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the  simple  regression.  The  primary  concern  in  this  comparison  is  be¬ 
tween  the  multiple  regression  with  frequency  and  power  and  the  simple 
regression  with,  weight,  since  the  power  and  frequency  equation  is  the 
only  one  in  which  both  the  explanatory  variables  are  significant.  For 
completeness,  however,  the  results  for  all  three  of  the  linear  multiple 
regressions  are  shown  and  will  be  discussed. 


Table  4 

COMPARISON  OF  MULTIPLE-LINEAR  WITH  SIMPLE-LINEAR 
REGRESSION  RESULTS 


Explanatory  Vax'iabtes 


Statistical 

Measures 

Weight 

Weight 

and 

Frequen-oy 

Weight 

arid 

Power 

Frequency 

and 

Power 

Standard  error 

5.808 

5.204 

6.192 

4.999 

Coefficient  of 

variation 

0.641 

0.574 

0.683 

0.552 

Coefficient  of 

determination 

0.325 

0.526 

0.329 

0.563 

Degrees  of  freedom 

8 

7 

7 

7 

Equation  (.33a)  y  in  which  weight  and  frequency  are  used,  appears  to 
give  slightly  better  results  in  a  comparison  with  the  other  measures. 
However,  the  coefficient  of  the  frequency  variable  is  not  significant 
at  the  10-percen'c  level.  As  a  consequence,  the  improvement  is  not  a 
statistically  significant  one.  The  generalized  test  to  determine 
whether  the  incremental  iiaprovement  associated  with  the  addition  of  a 
variable  is  significant  uses  an  F-statistic,  The  test  performed  with 
this  statistic  is  similar  to  the  ^-test.  In  this  case,  the  null 
hypothesis  is  that  the  increment  is  not  significant.  The  statistic 
used  to  test  this  null  hypothesis  is 

„  ^  Increment  of  explained  variance  r  degrees  of  freedom 
*  Remaining  unexplained  variance  t  degrees  of  freedom 


*See  F.  E.  Croxton,  D.  J.  Cowdcn,  and  S.  Klein,  Applied  Gener’al 
Stccbisticiit  3d  ed.,  Prentice -Hall,  Inc.,  Englewood  Cliffs,  New  Jersey, 
1960,  p.  627. 
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This  can  fae  rewritten  as 

F  -  <34) 

(1  -  S*)/7 

2 

where  R  =  the  coefficient  of  determination  of  the  aquation  that  in¬ 
cludes  weight  and  frequency, 

2 

r  =  the  coefficient  of  determination  of  the  equation  with  weight 
alone . 

Equation  (34)  shows  only  1  degree  of  freedom  involved  in  the  numerator, 
wh,lch  is  the  incremantal  degree  of  freedom  lost  by  adding  another  co¬ 
efficient.  The  degrees  of  freedom  in  the  danominator  equal  the  number 
of  observations  in  the  sample  less  the  number  of  coefficients  estimated. 

Substituting  the  appropriate  coefficients  of  determination  in  the 
formula  for  the  F-statistic,  we  obtain 


(.526  -  .325)  _  (.201) (7)  _  , 

(1  -  .526)/?  "  .474  "  ^  ' 


(35) 


This  value  falls  short  of  the  critical  value  of  F,  which  equals  3.95  at 
the  10-percent  level  of  significance.  Thus,  the  null  hypothesis  is  ac¬ 
cepted,  and  we  conclude  that  the  net  increment  in  explained  variance 
associated  with  the  addition  of  frequency  to  the  equation  containing 
weight  is  insufficient  to  establish  that  the  improvement  is  not  due  to 
chance . 

In  Eq.  (33b) ,  in  which  weight  and  power  are  used  as  explanatory  var 
iables,  it  can  be  seen  that  the  loss  of  the  degree  of  freedom  associa¬ 
ted  with  adding  another  variable  more  than  offsets  the  slight  increase 

2 

in  the  proportion  of  explained  variance  (i?  ) .  As  a  result,  the  stand¬ 
ard  error  in  this  case  is  greater  than  it  is  for  the  case  where  weight 
is  used  alone  (6.192  versus  5.808).  Thus,  not  only  are  the  variables 
not  significant,  but  the  equation  would  also  produce  slightly  less 
satisfactory  (larger)  prediction  intervals  than  simple  regression,  al¬ 
though  the  coefficient  of  determination  is  slightly  larger. 

Equation  (33c) ,  in  which  power  and  frequency  are  used  as  explana¬ 
tory  variables,  compares  favorably  with  the  simple  regression  in  which 
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weight  is  tised,  and  thus  far  appears  to  be  the  best  estimsting  equation 
derived.  However,  to  complete  the  analysis,  the  nonlinear  equations 
should  be  examined.  These  equations,  expressed  in  the  logarithms  of 
the  original  numbers,  have  the  general  form 

log  y  -  log  a  +  i>(log  x)  +  e(log  s) .  (36) 

The  results  for  each  of  the  possible  different  combinations  of  two  that 
can  be  developed  from  the  set  of  three  explanatory  variables  are  as 
follows: 


log  C  =  -1.8576  +  1.1385(log  W)  +  .2743(log  F) ,  (37a: 

(3.78)  (1.62) 

log  C  =  -0.6582  +  .7245(iog  W)  +  .1342(log  P) ,  (37b) 

(1.46)  (.842) 

log  C  =  -1.1933  +  .5756(log  P)  +  .6085(log  F) ,  (27c) 

(8.44)  (5.91) 

where  C  -  cost  in  thousands  of  dollars, 

W  =  weight  in  pounds, 

F  =  frequency  in  megahertz, 

P  =  power  in  watts. 

The  other  measures  required  to  complete  the  comparisons  between  the 
various  equations  are  shown  in  Table  5. 

The  major  patterns  in  the  nonlinear  multiple  regression  equations 
compared  with  the  nonlinear  simple  case  are  similar  to  those  for  the 
linear  equations.  The  use  of  both  frequency  and  weight  produces 
slightly  better  results,  but  the  coefficient  of  the  frequency  variable 
is  not  statistically  significant  at  the  10-percent  level.  The  use  of 
power  T-'ith  weight  again  produces  a  larger  standard  error  than  the  sim¬ 
ple  case  although  the  coefficient  of  determination  is  slightly  larger. 
In  all  respects,  the  best  nonlinear  equation  is  the  equation  that  uses 
power  and  frequency  as  explanatory  variables.  In  addition,  this  non¬ 
linear  equation  has  a  significantly  larger  coefficient  of  determination 
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than  the  biist  linear  equation.  The  best  linear  equation  also  uses  power 
and  frequency  and  has  a  coefficient  of  deti  ruination  of  .563.  The  non¬ 
linear  form  has  a  coefficient  of  determination  of  .913. 


Table  5 

COMPARISON  OF  MULTIPLE-NONLINEAR  WITH  SIMPLE-NONLINEAR 
REGRESSION  RESULTS 

Explanatory  Variables 


Statistical 

Log 

Log  Weight 
and 

Yjog  Weight 
and 

Log  Frequency 
and 

Measures 

Weight 

Log  Frequency 

Log  Fowcr 

Log  Power 

Standard  error 

0.2763 

0.2518 

0.2814 

0.1312 

Coefficient  of 
determination 

0.560 

0.680 

0.600 

0.913 

Degrees  of  freedom 

8 

7 

7 

7. 

The  remaining 

question 

is  whether  the 

nonlinear  results  are  suffi- 

ciently  superior  to  the  linear  results  to  conclude  that  the  nonlinear 
equation  should  be  used  in  preference  to  the  linear  one.  The  standard 
error  for  each  in  the  .original  numbers  at  the  mean  and  as  a  percentage 
of  the  mean  should  be  compared.  If  the  results  show  that  the  standard 
error  for  the  nonlinear  case  is  smaller,  this  evidence,  and  the  fact 
that  the  coefficient  of  determination  for  the  nonlinear  case  is  much 
larger,  can  be  used  as  a  basis  to  judge  in  favor  of  the  nonlinear  form. 

When  the  formulas  shown  in  Eq.  (31)  arc  used,  the  end  points  that 
delineate  the  standard  error  at  the  mean  for  the  nonlinear  equation  are 


9.060 


10 


.1312 


and  (9.060)10 


.1312 


When  the  end  points  are  simplified,  the  following  values  are  obtained: 


6.698  and  12.255. 


These  results,  expressed  as  differences  from  the  mean,  give  values  of 
2.362  below  and  3.195  above  the  mean.  Thus,  the  lower  band  cf  the 
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standard  error  for  the  nonlinear  case  is  26  percent  of  the  laean  and  the 
upper  bound  is  35  percent.  This  compares  favorably  with  the  coefficient 
of  variation  from  the  linear  case,  which  is  about  55  percent.  Thus, 
given  the  inherent  limitations  of  the  small  sample  size  of  10,  the  use 
of  the  nonlinear  form  improves  the  results  significantly.  The  preferred 
equation  is 


log  C  =  -1.1933  +  .5756(log  P)  +  .6085(log  P) ,  (38) 


or 


where  C  =  cost  in  thousands  of  dollars, 

P  =  power  in  watts, 

F  -  frequency  in  magahertz, 
log  =  logarithm  base  10. 

This  equation  is  also  acceptable  on  logical  grounds  since  the  estimated 
relationships  between  cost  and  power  and  cost  and  frequency  are  positive. 


Documentation 


Once  an  estlmaJiing  relationship  has  been  developed,  a  report  that 
documents  the  data,  assumptions,  and  analytical  results  is  indispens¬ 
able.  The  following  guidelines  for  preparing  the  report  are  suggested: 

1.  Describe  the  scope  and  coverage  of  the  study  and  of  the  equa¬ 
tions  that  have  been  developed. 

2.  Assuming  that  the  study  has  provided  for  a  survey  of  work 
already  performed  in  the  area  of  interest  (a  desirable  part 
of  any  cost-research  study) ,  prepare  a  summairy  of  the  survey 
results . 

3.  Describe  the  major  input  dat.a  used  in  the  study.  The  raw  and 
adjusted  data,  wliich  Includes  data  for  both  the  dependent  and 
explanatory  (independent)  variables,  should  be  documented  to 
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the  extent  that  is  feasible.  Include  data  not  only  for  those 
cost  categories  and  characteristics  used  in  the  final  estimat¬ 
ing  equations,  but  also  for  those  characteristics  that  were 
considered  but  were  eliminated  in  the  process  of  analysis . 
Describe  and  explain  fully  any  adjustments  to  the  raw  data; 
indicate  limitations  and  accuracy.  Because  one  of  the  outputs 
of  a  cost -rose arch  study  is  the  data  base  Itself,  documentation 
should  be  sucl'*  that  the  data  base  will  be  useful  in  future 
studies . 

4.  Identify  sources  and  dates  of  the  data. 

5.  Define  each  dependent  and  explanatory  variable  considered  in 
the  study ^  (Unambiguous  definitions  of  weapon  system  charac¬ 
teristics  and  cost  elements  are  usually  more  Involved  than 
appears  at  first  glance.) 

6.  Provide  the  major  dependent-  versus  single-explanatory "V.->riable 
scatter  diagrams  used  in  the  study.  The  diagrams  should  be 
labeled  to  Identify  each  data  point. 

7.  Document  the  final  equations  as  well  as  the  other  major  equa¬ 
tion  forms  examined  in  the  study;  include  such  statistics  as 
the  standard  error  of  estimate,  coefficient  of  determination, 
coefficient  of  variation,  and  prediction  intervals  to  the 
extent  that  they  are  derived  for  each  equation.  Other  criteria 
that  are  considered  appropriate  for  indicating  the  goodness  of 
fit  and  prediction  capabilities  of  the  equations  should  be 
described . 

8.  For  the  major  final  equations,  ‘prepare  a  table  such  as  Table 
6  to  show  the  observed  values  of  the  dependent  variables,  the 
estimated  values,  the  deviations,  and  the  percent  deviation 
from  the  observed  values.  In  addition,  prepare  a  scatter 
diagram,  such  as  that  illustrated  in  Fig.  IZ,  on  wiiich  the 
observed  values  versus  the  estimated  values  are  plotted.  The 
points  on  the  diagram  should  be  labeled  to  Identify  each  item. 
(Figure  12  shows  that  the  apparent  problem  of  stratification 
illustrated  in  Fig.  4  has  been  eliminated  by  including  fre¬ 
quency  as  an  explanatory  variable.) 
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9 .  Describe  the  alternative  equations  that  ware  considered  and 
why  they  were  rejected.  The  report  should  convey  a  sense  of 
the  improvement  that  results  from  a  high  degree  of  selectivity 
in  choosing  the  final  forms.  The  alternative  equations  could 
show 

a.  The  use  of  different  explanatory  variables; 

b.  Different  forms  of  the  equations,  e.g.,  linear,  multi- 
jilicative  (linear  in  the  logarithms),  or  other  nonlinear 
f  oimis ; 

c.  The  use  of  different  forms  of  the  dependent  variables, 
e.g.,  cost  per  pound  or  cost  per  item; 

d.  The  use  of  stratified  dependent  variables  grouped  into 
subcategories  that  are  determined  by  such  factors  as  ship 
or  missile  type,  weight,  frequency,  or  speed  regime. 

10 .  Describe  any  special  methodology  In  an  appendix  if  only  of 
special  interest  (e.g.,  a  sophisticated  mathematical  approach). 

11.  Describe  the  cost-estimating  methods  fully  and  clearly.  It 
should  be  possible  to  reconstruct  the  results  of  the  study 
from  the  data  base  as  it  is  given  in  the  report.  The  major 
^lssumptions,  statistical  and  otherwise,  used  in  the  deriva¬ 
tion  of  the  equations  should  be  explicitly  stated. 


Table  6 

ACTUAL  AND  ESTIMATED  COSTS  OF  AIRBORNE  COMMUNICATION  EQUIPMENT 


Actual 

Estimated 

Deviation 
(Actual  less 

Cost 

Cost 

estimate) 

Percent 

($) 

($) 

($) 

Deviation 

22,200 

13,768 

+8,432 

+38 

17,300 

16 ,970 

+1,330 

+8 

11,800 

17,388 

-5,588 

-47 

9,600 

9,238 

+362 

+4 

8,800 

9,238 

-438 

-5 

7,600 

6,435 

+1,165 

+15 

6,800 

6,885 

-85 

-1 

3,200 

4,581 

-1,381 

-43 

1,700 

2,062 

-362 

-21 

1,600 

1,261 

+339 

+21 

Average  of  absolute  value  of  percent  deviations  =  20 


Actual  cost  ($  thousauds) 
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0  5  10  15  20  25 


Estimated  cost  ($  thousands) 

Fig.  12 — Aotml  cost  versua  estimated  cost 

12.  Provide  an  example  to  illustrate  the  procedure  for  using  the 
final  cost-estimating  relationship, 

13.  Describe  the  limitations  of  the  final  equations  as  specifi¬ 
cally  as  possible.  State  the  range  of  characteristics  over 
which  the  estimating  procedure  applies  and  any  other  restric¬ 
tions  on  the  population  covered  by  the  equations. 


DEVELOPMENT  OF  ESTIMATING  RELATIONSHIPS 


11 


BIBLIOGRAPHY 


Croxton,  F*  E.,,  D.  J.  Cowden,  and  S,  Klein,  Applied  Geneval  StatistioSi 
3d  ed.,  Prentice-Hall,  Inc,,  Englewood  Cliffs,  New  Jersey,  1960. 

Ezekiel,  M.  J.  B.,  and  K.  Fox,  Methods  of  Cowelation  and  Regvession 
Analysis t  3d  ed.,  John  Wiley  &  Sons,  Inc.,  New  York,  1959. 

Graver,  C.  A.,  and  H.  E.  Boren,  Jr,,  Multivaviate  Logavithmio  and  Ex¬ 
ponential  Regression  Models^  The  Rand  Corporation,  RM-4879-PR.  July 
1967. 

Johnston,  John,  Eaoncmetrio  Methods^  McGraw-Hill  Book  Comnany,  Inc., 

New  York,  1963. 

Mood,  A.  M.,  and  F.  A.  Graybill,  Introduction  to  the  Theory  of  Statis¬ 
tics^  2d  ed.,  McGraw-Hill  Book  Company,  Inc.,  New  York,  1963.  First 
edition  by  A.  M.  Mood  published  in  1950. 

Spurr,  W.  A.,  and  C.  P.  Bonini,  Statistical  Analysis  for  Business  Deci¬ 
sions,  Richard  D.  Irwin,  Inc.,  Homewood,  Illinois,  1967. 

»  Kellog,  and  J.  H.  Smith,  Business  and  Economic  Statistics, 

Richard  D.  Irwin,  Inc.,  Homewood,  Illinois,  1961. 

Wallis,  W.  A.,  and  H.  V.  Roberts,  Statistics,  The  Free  Press,  New  York, 
1963. 

Zusman,  Morris,  Use  of  Cost  Models  in  Sensitivity  Analysis  and  as  a 
Design  Aid,"  Institute  of  Defense  Analyses,  N-587-R,  September  11, 


IV.  USE  OF  COST-ESTIMAi 


RELATIONSHIPS 


THE  WIDL'SrP.EAD  USE  of  estimating  relationships  in  the  form  of  simple 
cost  factors,  equations,  curves,  nomograms,  ?nd  rules  of  thumb  attests 
to  their  value  and  to  the  variety  of  situations  in  which  they  can  be 
helpful.  But  an  estimating  relationship  can  only  be  derived  from  in¬ 
formation  on  past  occurrences,  and  the  past  is  not  always  a  reliable 
guide  to  the  future.  As  all  horseplayers  know,  the  favorite  runs  out 
of  the  money  often  enough  to  prove  that  an  estimate  based  on  past  per¬ 
formance  is  very  likely  to  be  wrong.  Admittedly,  there  may  be  other 
factors  at  work  in  a  horserace,  but  the  problem  remains  the  same  as 
that  encountered  in  any  attempt  to  predict  the  course  of  future  events, 
i.e.,  how  much  confidence  can  be  put  in  the  prediction?  This  question 
dominates  all  other  considerations  in  any  discussion  of  the  use  of  esti¬ 
mating  relationships. 

These  remarks  are  not  intended  to  depreciate  the  value  of  estimat¬ 
ing  relationships.  They  are  an  important  tool  in  an  estimator’s  kit 
and,  in  many  cases,  the  only  tool.  Thus,  it  is  essential  that  their 
limitations  be  understood  to  preclude  their  improper  use.  The  limita¬ 
tions  of  estimating  relationships  stem  from  two  sources:  first,  the 
uncertainty  inherent  in  any  application  of  statistics  and  second,  the 
uncertainty  that  an  estimating  relationship  is  applicable  to  a  partic¬ 
ular  article.  Tlie  first  pertains  primarily  to  articles  well  within 
the  bounds  of  the  sample  on  which  the  relationship  is  based;  even  here, 
uncertainty  may  be  found.  The  second  source  refers  to  those  cases  in 
which  the  article  has  characteristics  somewhat  different  from  those  of 
the  sample.  Although  extrapolation  beyond  the  sample  is  universally 
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deplored  by  statisticians,  it  is  universally  practiced  by  cost  analysts 
in  dealing  with  advanced  hardware  because,  in  most  instances,  it  is 
precisely  those  systems  outside  the  range  of  the  sample  that  are  of 
interest.  The  question  is  whether  the  equation  is  relevant  to  the  case 
under  Investigation,,  although  good  statistical  practice  would  question 
the  validity  of  such  an  approach. 


Characteristics  of  the-  Estimating  Relationship 

The  degree  of  emphasis  placed  on  statistical  treatment  of  data  can 
cause  two  fundamental  point?  to  be  overlooked;  first,  that  an  estimat¬ 
ing  jCselationshlp  must  be  reasonable  and  second,  that  it  must  have  pre¬ 
dictive  value. 

Reasonableness  can  be  tested  in  various  ways — by  inspection,  by 
simple  plots,  and  by  complicated  techniques  that  involve  an  examination, 
of  each  variable  over  a  range  of  possible  values.  Inspection  will  often 
suffice  to  indicate  that  an  estimating  relationship  is  not  structurally 
sound.  For  example,  the  following  equation  is  the  result  of  an  exer¬ 
cise  at  the  Air  Force  Institute  of  Technology  in  which  students  were 
asked  to  develop  co;jt-estlmating  relationships  for  small  missiles: 

C  =  8347.5  +  150.617  -  ’.149. li?,  (1) 

where  C  =  cost  of  airframe  +  guidance  and  control, 

W  =  weight  in  pounds, 
i?  =  range  in  miles. 

'.Chis  equation  fits  the  data  very  well,  but  it  states  that  as  range  in¬ 
creases,  the  cost  decreases;  such  an  assumption  appears  to  be  in  error. 
If  cost  is  a  function  of  range,  the  relationship  should  be  direct 
rather  than  inverse.  To  investigate  further,  choose  two  hypothetical 
but  reasonable  values  for  W  and  i?  within  the  range  of  the  sample  data: 
38.5  -  157  lb  for  W,  5.0  -  14.8  mi  for  R.  Table  1  shows  that  Missile 
B,  although  heavier  and  with  greater  range  than  Missile  A,  is  estimated 
as  the  cheaper  of  the  two,  which  is  contrary  to  experience.  A  reexam¬ 
ination  of  the  sample  data  and  the  equation  is  in  order. 
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.  Table  1 

(js 


SAMPLE  COST  COMPARISON 

Ai'jSe^rame  Weight 

OF  .THO 

MISSILES 

- 

Estimated  Aiiiframe  Cost 

Hypothetical 

+  Guidance  and  Control 

Range 

+  Guidance  and  Control 

Missile 

(lb)i- 

(mi) 

<. 

($) 

A 

50 

5 

10,132 

B 

.75 

10 

8,152 

When  an  estimating  relationship  is  developed  to  make  a  particular 
estimate,  it  may  have  little  i»redictive  value  outside  a  narrow  range. 
As  an  example^  consider  the  following  equation  for  estimating  the  cost 
of  solid-propellant  motors  for  small  missiles: 

Cost  =  1195.6  +  .0000031^,  (2) 

where  J  =  total  impulse. 

The  equation  fits  the  sample  data  very  well: 


Missile 

Observed  Cost 

Estimated  Cost 

Motor 

($) 

($) 

A 

2600 

2660 

B 

1700 

1693 

C 

1250 

1265 

D 

1750 

1781 

If  it  were  appropriate  to  use  statistical  measures  for  a  sample  of  4, 

Eq.  (2)  explains  ever  99  percent  of  the  total  variance.  But,  note  that 

the  constant  1195.6  accounts  for  94  percent  of  the  cost  of  Motor  C  and 

that  the  cost  of  all  motors  smaller  than  Motor  C  will  be  about  $1200. 

2 

Because  of  the  I  term,  the  influence  of  total  Impulse  is  likely  to  be 
too  pronounced  for  motors  larger  than  those  in  the  siunple. 

A  common  method  of  examining  the  Implications  of  an  estimating  re¬ 
lationship  for  values  outside  the  range  of  the  sample  is  to  plot  a  scal¬ 
ing  curve  as  shown  in  Fig.  1.  Scaling  curves  may  be  plotted  on  either 
aritlmietlc  or  logarithmic  graph  paper  as  Fig.  1  illustrates;  cost  ana¬ 
lysts  usually  prefer  the  log-linear  representation-  The  theory  on  which 
a  scaling  curve  is  based  is  as  follows:  As  an  item  increases  in  weight 
(or  another  dimension) ,  the  Incremental  coat  of  each  additional  |>ound 
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Fig.  1 — Scaling  curve:  cost  per  pound  versus  dry  weight 

(or  square  foot,  watt,  horsepower)  will  decrease  or  increase  in  a  pre¬ 
dictable  way.  Thus,  in  Fig.  1  the  cost  per  pound  of  an  electrical 
power  subsystem  in  a  manned  spacecraft  decreases  from  about  $4200  to 
$1400  as  the  total  weight  increases  from  100  to  1000  lb.  The  slope  of 
the  curve  is  fairly  steep;  if  the  curve  were  extended  to  the  right,  it 
might  be  expected  to  flatten.  Eventually,  the  curve  might  become  com¬ 
pletely  flat  at  the  point  at  which  no  more  economies  of  scale  can  be 
realized,  but  it  is  unlikely  that  the  slope  would  (wer  become  positive. 

Now  examine  Fig.  2  in  which  total  impulse  is  plotted  against  cost 
per  pound-second  based  on  values  obtained  from  an  estimating  relation¬ 
ship.  Two  differences  are  immediately  seen.  First,  the  lefthand  por¬ 
tion  of  the  curve  is  unusually  steep.  Secoiid,  the  slope  becomes  posi¬ 
tive  when  total  Impulse  exceeds  about  22,000  lb-sec.  In  some  instances, 
fabrication  problems  Increase  with  the  size  of  the  object  being  fabri¬ 
cated  and  a  positive  slope  may  result.  No  such  problems  are  encountered 
in  the  manufacture  of  small,  solid-propellant  rocket  motors-,  however, 
and  continued  economies  of  scale  are  to  be  expected. 

Figure  2  illustrates  another  point:  A  more  useful  estimating  re¬ 
lationship  could  have  been  obtained  by  drawing  a  trend  line  rather  than 
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Total  impulse  (lb-sec) 


Fig.  2 — Cost  per  potmd-seoond  versus  total-  impulse 


by  fitting  a  curve  to  the  four  data  points.  With  a  small  sample,  it  is 
often  possible  to  write  an  equation  that  fits  the  data  perfectly,  but 
the  equation  is  useless  outside  the  range  of  the  sample.  Statistical 
manipulation  of  a  sample  this  size  rarely  produces  satisfactory  results. 

A  final  example  of  the  kind  of  error  that  undue  reliance  on  sta¬ 
tistical  measures  of  fit  may  bring  about  is  based  on  an  estimating 
equation  for  aircraft  airframes.  Initially,  the  equation  for  estimat¬ 
ing  airframe  production  labor  hours  was  based  on  a  sample  of  44  air¬ 
craft.  It  then  seemed  that  a  grouping  of  the  aircraft  by  type  should 
give  better  correlation  and,  in  fact,  when  the  bombers,  fighters, 
trainers,  and  cargo  aircraft  were  considered  separately,  the  average 
deviation  between  estimates  and  actual  values  was  markedly  reduced. 

For  example,  in  the  case  of  trainer  aircraft,  the  average  deviation 
was  reduced  from  20  to  6  percent,  and  a  more  useful  estimating  rela¬ 
tionship  was  obtained.  In  the  case  of  fighters,  however,  although 
average  deviation  was  reduced  from  15  to  11  percent,  the  estimating 
equation  exhibited  the  flaw  shovm  in  Eq.  (3): 


Manufacturing  hours  =  4.28  (weight)^*^®(speed) 


(3) 


84 


EQUIPMENT  COST  ESTIMATING 


The  exponent  of  weight  is  greater  than  1.0,  which  .means  that  when  speed 
is  held  constant  and  weight  increased,  the  man-hours  per  pound  of  air¬ 
frame  weight  will  Increase.  This  can  be  seen  in  Fig.  3.  The  dashed 
lines  show  scaling  curves  derived  from  the  total  sample  of  44  aircraft. 
These  portray  the  normal  relationship — as  weight  increases,  hours  per 
pound  decrease.  The  regression  equation  gives  the  opposite  results 
because  the  general  trend  in  fighter  aircraft  has  been  for  increased 
speed  to  be  accompanied  by  Increased  weight,  which  causes  an  emphasis 
on  the  weight  variable.  It  cannot  be  assumed,  however,  that  all  new 
fighters  will  conform  to  this  trend;  the  equation,  if  used  at  all,  would 
have  to  be  used  with  great  care. 

The  advice  is  .frequently  given  that  an  estimating  relationship 
should  not  be  used  mechanically.  This  Implies  (1)  that  the  function 
must  be  thoroughly  understood  and  (2)  that  the  hardware  Involved  must 
be  understood  as  well.  To  illustrate  the  first  point,  examine  an 
estimating  relationship  for  direct  manufacturing  hours  derived  from  a 


Fig.  3 — Comparison  of  regression  lines  with  sealing  curves 
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sample  of  Navy  and  Air  Force  airframes: 


^100  "  (4) 

where  =  manufacturing  labor  hours  required  to  produce  the  100th 

airframe, 

H  -  gross  takeoff  weight  in  pounds, 

5  =  maximum  speed  in  knots. 

The  multiple  correlation  coefficient  is  0.98  and  the  coefficient  of  var¬ 
iation  is  .016  in  logarithmic  terms.  Despite  these  satisfactory  meas¬ 
ures  of  fit,  a  comparison  of  the  actual  manufacturing  hours  for  each 
airframe  in  the  sample  with  those  estimated  by  the  equation  provides  a 
better  understanding  of  how  the  relationship  relates  to  the  real  world. 
In  such  a  comparison,  as  shown  by  Table  2,  33  percent  of  the  estjjiiates 
diff-^r  from  the  actuals  by  more  than  20  percent,  and  7  percent  differ 
by  more  than  30  percent.  These  figures  imply  that  an  analyst  with  only 
the  estimating  relationship  on  which  to  rely  may  or  may  not  obtain  a 
good  estimate.  However,  if  the  less  acceptable  results  can  be  explained 
in  some  way,  the  analyst  is  then  in  a  much  better  position  to  understand 
the  strengths  and  weaknesses  of  the  equation. 

Since  this  estimating  relationship  is  based  on  gross  takeoff  weight 
and  maximum  speed,  an  initial  hypothesis  to  explain  the  variations  might 
be  that  the  estimates  decrease  in  quality  at  one  end  of  the  weight  or 
speed  range  or  in  certain  combinations  of  weight  and  speed.  In  this 


Table  2 


COMPARISON  OF  ACTUAL  AND  ESTIMATF.D 
MANUFACTURING  HOURS 


Difference  Between 
Actual  Hours  and  Humber 

Estimated  Hours  of 

Airfranes 


Percentage 

of 

Sample 


10  or  less 
11-20 
21-30 
31-40 


15 

3 

7 

2 


56 

11 

26 

7 


Maximum  speed 
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case,  however,  as  shown  In  Fig.  4,  the  poorer  estimates  are  scattered 
throughout  the  sample,  which  indicates  no  consistent  bias  because  of 
the  explanatory  variables. 

A  second  hypothesis  might  be  that  the  manufacturing  history  of  the 
airframes  in  the  sample  explains  the  discrepancies  and,  in  general,  this 
hypothesis  is  valid.  Of  the  nine  airframes  in  the  sample  for  which  esti¬ 
mates  differed  from  actuals  by  20  percent  or  more,  several  were  consid¬ 
ered  problem  airframes,  i.e.,  airframes  for  which  the  manufacturer 
encountered  an  abnormal  number  of  problems  in  meeting  weight  and  per¬ 
formance  specifications.  Interestingly  enough,  these  were  not  aircraft 
in  which  a  major  state-of-the-art  advance  vjas  being  attempted.  Another 
cause  for  discrepancy  was  the  interspersion  of  different  models  of  the 
same  aircraft  in  a  single  lot:  For  example,  reconnaissance  versions 
of  a  bomber  were  interspersed  among  bomber  airframes.  Situations  of 
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this  kind  increase  direct  labor  requirements.  The  two  airframes  for 
which  the  estimates  were  the  poorest  and  for  which  almost  40  percent 
less  labor  than  the  equation  predicted  was  required,  were  vastly  dif¬ 
ferent  ones — a  large  transport  and  a  supersonic  fighter.  Production 
of  one  of  these  airframes  benefited  from  tlie  manufacturer’s  concurrent 
experience  with  a  commercial  airplane  of  similar  configuration.  The 
other  case  cannot  be  explained.  The  amount  of  labor  involved  in  pro¬ 
ducing  the  airplane  was  unusually  low. 

Although  it  is  not  possible  to  resolve  all  uncertainties  v^ith  the 
information  available,  an  estimator  can  feel  reasonably  confident  that 
the  estimating  relationship  does  not  contain  a  systematic  bias,  that 
it  should  be  applicable  to  normal  production  programs,  and  that  it 
provides  reasonable  estimates  throughout  the  breadth  of  the  sample. 

Hardware  Considerations 

The  sample  included  aircraft  having  gross  takeoff  weights  of 
6100  lb  to  450,000  lb  and  maximum  speeds  of  300  kn  to  1200  kn.  Suppose 
that  a  proposed  new  aircraft  has  a  gross  weight  of  600,000  lb  and  a 
maximum  speed  of  1700  kn.  Should  .C^.  (  4)  be  used  as  the  estimating 
equation  in  this  case?  The  same  question  could  arise  for  an  aircraft 
with  weight  and  speed  that  are  in  the  sample  range,  but  which  is  to  be 
fabricated  by  a  new  process  or  out  of  a  new  material.  Again,  the  esti¬ 
mator  must  decide  whether  the  equation  is  relevant  or  how  it  can  be 
modified  to  be  useful.  An  estimating  relationship  can  be  used  properly 
only  by  a  person  familiar  with  the  type  of  equipment  whose  cost  is  to 
be  estimated.  To  say  that  an  analyst  who  estimates  the  cost  of  a  de¬ 
stroyer  should  be  familiar  with  the  characteristics  of  destroyers  is 
a  truism;  however,  an  estimator  is  sometimes  far  removed  from  the  act¬ 
ual  hardware.  Further,  he  may  be  expected  to  provide  costs  for  air-to- 
air  missiles  one  week  and  for  a  new  antiballistic  missile  system  the 
next.  The  tendency  in  such  a  situation  may  be  to  use  the  equation  that 
appears  most  appropriate  without  taking  the  required  measures  to  deter¬ 
mine  whether  the  equation  is  applicable. 
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■to  illustrate  the  problem,  assume  that  a  ne»  supersonic  bomber  is 
proposed  having  a  gross  ueigbt  of  450,000  lb  and  a  maximum  speed  of 
1700  bn.  Equation  (4)  may  be  inappropriate  because  fne  speed  is  Ear  be¬ 
yond  the  range  of  the  sample.  On  the  other  hand,  no  equation  exists  for 
aircraft  in  that  speed  range,  and  an  estimate  is  requited.  This  situa¬ 
tion  may  be  regarded  as  the  normal  one,  and  there  Is  no  choice  but  to 
use  »hat  is  available.  In  this  example,  Eq.  C4)  gives  542,000  direct 
labor  manufacturing  hours. 

The  next’ step  i.t  to  compare  the  result  »ith  other  similar  systems 
to  see  if  the  estimate  appears  reasonable.  In  this  Instance  manufac¬ 
turing  hours  versus  gross  uelght  are  plotted  for  several  other  large 
aircraft  as  shovn  In  Fig.  5.  The  supersonic  bomber  estimate  SSBj  is 
substantially  above  the  trend  as  it  should  be,  because  a  1700-kn  air¬ 
frame  uill  be  more  difficult  to  build  than  a  subsonic  airframe  of  the 
same  size.  If  other  infotmtlon  Is  lacWng,  an.  estimator  might  accept 
the  figure  of  542,000  hr.  In  this  case,  however,  all  the  airframes  in 
the  sample  were  fabricated  almost  entltuly  of  aluminum;  an  airframe 
built  to  withstand  the  heat  generated  by  sustained  flight  in  the  atmos¬ 
phere  at  a  speed  of  .about  Mach  3  will  require  a  metal  such  as  stainless 
steel  or- titanium.  The  question  that  occurs  is  whether  the  speed  vari¬ 
able  iu  the  equation  fully  accounts  for  this  change  in  technology. 


hours  per  pound  of  airframe 
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One  way  to  answer  this  question  is  to  plot  a  second  scatter  dia¬ 
gram,  with  speed  as  the  independent  variable.  Figure  6  shows  labor 
hours  per  pound  of  airframe  weight  plotted  against  speed  with  a  calcu¬ 
lated  line  of  best  fit  drawn  through  the  scatter.  If  an  airframe 
weight  of  125,000  lb  out  of  a  gross  weight  of  450,000  lb  is  assumed, 
the  estimate  of  542,000  hr  is  equal  to  4.3  hr-lb  of  airframe,  which 
not  only  is  below  the  calculated  trend  line,  but  is  also  below  any  rea 
sonable  trend  line  that  can  be  drawn  through  the  sample.  (This  point 
is  shown  as  SSB^^  in  Fig.  6.) 


o 

.o 

to 


0  2  4  6  8  10  ]  2  14  16  18  20 


Maximum  speed  (hundreds  of  n  mi) 


Fig.  6 — Labor  haars  -per  pound  versus  maximum  speed 
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Three  possible  estimates  can  now  be  considered:  542,000  hr  based 
on  speed  and  weight;  about  300 ^000  hr  based  on  weight  alone  as  shown 
by  Fig.  5;  and  about  925,000  hr  based  on  speed  alone  as  shown  by  the 
regression  line  in  Fig.  6  (7.4  hr-lb  x  125,000  lb  =  925,000  hr).  More 
Information  is  needed  to  narrow  the  range. 

Although  data  are  less  than  abundant,  several  experimental  and  pro¬ 
totype  aircraft  have  been  fabricated  using  stainless  steel  and  titanium. 
On  the  basis  of  prototype  experience,  one  manufacturer  maintains  that 
a  titanium  airframe  requires  twice  the  number  of  hours  that  an  aluminum 
airframe  requires;  however,  manufacturing  hours  for  an  aluminum  air¬ 
frame  can  vary  considerably.  A  second  approach  is  more  precise.  An 
examination  of  actual  data  for  different  airframes  with  tpeeds  of  Mach 
3  and  above  shows  that  these  airframes  require  about  1.5  times  as  many 
hours  as  the  estimating  relationship  of  Eq.  (4)  indicates,  which  implies 
813,000  hr  or  6.5  hr-lb  for  the  supersonic  bomber.  (This  point  is  shown 
as  SSB2  in  Fig.  6.)  On  the  basis  of  current  knowledge,  the  estimate 
appears  to  be  reasonable.  Further  measures  could  be  taken  in  the  form 
of  another  independent  estimate  that  uses  a  different  estimating  rela¬ 
tionship.  An  estimator  does  not  have  this  option  for  most  kinds  of 
hardware,  because  estimating  relationships  are  not  plentiful.  However, 
in  the  case  of  airframes,  a  number  of  equations  have  been  developed 
over  the  years;  it  is  good  practice  to  use  one  to  confirm  an  estimate 
made  with  another. 

Judgment  in  Cost  Estimating 

The  need  for  judgment  is  often  mentioned  in  connection  with  the 
use  of  estimating  relationships.  Although  this  need  may  be  self-evident, 
one  of  the  problems  in  the  past  has  been  too  much  reliance  on  judgment 
and  too  little  on  estimating  relationships.  The  problem  of  introducing 
personal  bias  with  judgment  has  been  studied  in  other  contexts,  but  the 
conclusions  are  reievant~*to  this  discission.  In  brief,  a  person's  occu¬ 
pation  or  position  seems  to  influence  his  forecasts.  Thus,  a  consistent 
tendency  toward  low  estimates  appears  among  those  persons  whose  inter¬ 
ests  are  served  by  low  estimates,  e.g.,  proponents  of  a  new  weapon  or 
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support  system  whether  in  industry  or  in  government.  Similarly,  there 
a^e  people  in  Industry  and  in  government  whose  interests  are  served  by 
caution.  As  a  consequence,  their  estimates  are  likely  to  run  higher 
than  would  be  the  case  were  they  free  from  all  external  pressures.  (In 
fairness  to  this  latter  group,  however,  overestimates  are  rare  enough 
to  suggest  that  caution  is  not  a  quality  to  be  despised.) 

The  primary  use  of  judgment  should  be  to  decide  first,  whether  an 
estimating  relationship  can  be  used  for  an  advanced  system,  and  second, 
if  so,  what  adjustments  will  be  necessary  to  take  into  account  the.  ef¬ 
fect  of  a  technology  that  is  not  present  in  the  sample.  Judgment  is 
also  required  to  decide  whether  the  results  obtained  from  an  estimating 
relationship  are  reasonable.  This  does  not  mean  reasonable  according 
to  a  preconception  of  what  the  cost  ought  to  be,  but  reasonable  in  a 
comparison  with  the  past  cost  of  similar  hardware.  A  typical  test  for 
reasonableness  is  to  study  a  scattergram  such  as  Fig.  7  of  costs  of 
analogous  equipment  at  some  standard  production  quantity.  The  estimate 
of  the  article  may  be  outside  the  trend  lines  of  the  scattergram  and 
stil]  be  correct,  but  an  Initial  presumption  exists  that  a  discrepancy 
has  been  discovered  and  that  this  discrepancy  must  be  investigated.  An 
analyst  who  emerges  from  his  deliberations  with  an  estimate  implying 
that  new,  higher  performance  equipment  can  be  procured  for  less  than 
the  cost  of  existing  hardware  knows  that  his  task  is  not  finished.  If, 
after  research,  he  is  convinced  that  the  estimate  is  correct,  he  should 
then  be  prepared  to  explain  the  new  development  that  is  responsible  for 


Fig.  7— Cost  compax>ison  of  analogous  equipment 
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the  decrease  in  cost.  He  should  not  raise  the  cost  arbitrarily  by  a 
percentage  to  »ake  the  figure  appear  more  acceptable  ci  because  he  feels 
that  the  estimate  is  too  low.  (Such  adjustments  are  the  province  of 
management  and  are  generally  occasioned  by  reasons  S'omewhat  removed  from 
those  discussed  here.)  Judgments  must  be  based  on  well-defined  evidence-. 
The  only  Injunction  to  be  observed  is  that  any  change  in  an  estimate  be 
fully  documented  to  ensure  that  the  estimate  can  be  thoroughly  under- 
.stood,  and  to  provide  any  Information  that  may  be  needed  to  reexamine 
the  eqiiaticns  in  the  light  of  the  new  data. 


V.  THE  LEARNING  CURVE 


FOR  MANY  YEARS  the  aerospace  industry  has  made  use  of  what  variously 
have  been  called  "learning,"  "progress,"  "Improvement,"  or  "experience" 
curves  to  predict  reductions  in  cost  as  the  number  of  items  produced' 
Increases.  The  learning  process  is  a  phenomenon  that  prevails  in  i^ny 
Industries;  its  existence  has  been  verified  by  empirical  data  and  con¬ 
trolled  tests.  Although  there  are  several  hypotheses  on  the  exact  man¬ 
ner  in  which  the  learning  or  cost  reduction  can  occur,  the  basis  of 
learning-curve  theory  is  that  each  time  the  total  quantity  of  items  pro¬ 
duced  doubles,  the  cost  per  item  is  reduced  to  a  constant  percentage  of 
its  previous  cost.  Alternative  forms  of  the  theory  refer  to  the  in¬ 
cremental  (unit)  cost  of  producing  an  item  at  a  given  quantity  or  to 
the  average  cost  of  producing  all  items  up  to  a  given  quantity.  For 
example,  if  the  cost  of  producing  the  200th  unit  of  an  item  is  80  per¬ 
cent  of  the  cost  of  producing  the  lOCth  item,  and  if  the  cost  of  the 
400th  unit  is  80  percent  of  the  cost  of  the  200th,  and  so  forth,  the 
production  process  is  said  to  follow  an  80-percent  unit  learning  curve. 
If  the  average  cost  of  producing  all  200  units  is  80  percent  of  the 
average  cost  of  producing  the  first  100  units,  the  process  follows  an 
80-percent  emulative  average  learning  curve. 


The  quantities  mentioned  in  connection  with  the.  learning  concept 
presuppose  the  Inclusion  of  all  items.  As  concerns  the  J-79  engine 
used  on  the  F-4  airplane,  one  would  expect  engine  costs  for  the  first 
100  F-4s  to  be  more  than  that  for  the  second  100  airplanes.  Although 
this  is  true,  what  is  important  is  that  the  .t-79  has  been  used  on  sev¬ 
eral  other  types  of  aircraft,  and  these  uses.  Including  full  spare 
engines,  must  be  considered  in  learning-curve  analysis. 
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Either  formulation  of  the  theory  results  in  a  power  function  that 
is  linear  on  logarithmic  grids.  Figure  1  shows  a  unit  curve  for  which 
the  reduction  in  cost  is  20  percent  with  each  doubling  of  cumulative 
output,  the  upper  figure  showing  the  curve  on  arithmetic  grids  and  the 
lower  on  logarithmic  grids.  The  arithmetic  plot  illustrates  that  the 
percentage  reduction  in  cost  in  each  unit  is  very  pronounced  for  the 
early  units.  On  an  80-percen.t  curve,  for  example,  cost  decreases  to 
28  percent  of  the  original  value  over  the  first  50  units.  Over  the 
next  50  units,  it  declines  only  5  more  percentage  points,  i.e.,  down  to 
23  percent  of  unit  1  cost.  The  factors  that  account  for  the  decline  in 
unit  cost  as  cumulative  output  increases  are  numerous  and  not  completely 
understood.  Those  most  commonly  mentioned  are 

1.  Job  familiarization  by  workmen,  which  results  from  the  repeti¬ 
tion  of  manufacturing  operations. 

2.  General  Improvement  in  tool  coordination,  shop  organization, 
and  engineering  liaison. 

3.  Development  of  more  efficiently  produced  subassemblies. 

4.  Development  of  more  efficient  parts-supply  systems. 

5.  Development  of  more  efficient  tools. 

6.  Substitution  of  cast  or  forged  components  for  machined  compo¬ 
nents. 

7.  Improvement  in  overall  management. 

The  above  list  of  relevant  factors  is  not  complete,  and  it  tends  to 
understate  the  importance  of  the  item  sometimes  considered  the  most 
important — labor  learning.  Labor  cost,  however,  cannot  decline  through 
experience  gained  by  workmen  unless  management  also  becomes  more  effi¬ 
cient.  In  other  words,  it  is  necessary  for  management  to  organize  and 
coordinate  more  efficiently  the  work  of  all  manufacturing  departments 
so  that  parts  and  assemblies  will  flow  smoothly  through  the  plant. 

Labor  cost  is  not  the  only  element  of  manufacturing  that  declines 
as  cumulative  output  increases.  A  learning  curve  exists  for  unit  mate¬ 
rials  cost.  The  materials  category  frequently  includes  much  purchased 
equipment,  which  in  turn  Includes  a  substantial  number  of  engineering, 
tooling,  and  labor  hours.  Unit  hours  decline  as  production  quantities 


Unit  labor  cost 
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Increase,  and  the  contractor  who  buys  in  successive  lots  is  f.anerally 
able  to  negotiate  a  lower  price  for  each  lot.  Decreases  In  raw  mate¬ 
rial  costs  are  generally  attributed  to  two  factors  as  cumulative  out¬ 
put  increases:  The  workmen  learn  to  work  the  raw  materials  more  effi¬ 
ciently,  cutting  down  spoilage  and  reducing  the  rejection  rate,  and 
management  learns  to  order  materials  from  suppliers  in  shapes  and  sizes 
that  reduce  the  amount  of  scrap  that  must  be  shaved  and  cut  from  the 
pieces  of  sheet  or  bar  to  fabricate  the  item  of  equipment.  Substitu¬ 
tion  of  forgings  for  machined  parts  also  reduces  the  amount  of  scrap 
material. 

A  second  factor  that  is  probably  responsible  to  a  lesser  extent  for 
the  decline  in  materials  cost  is  the  pricing  policy  of  the  raw  material 
suppliers.  These  suppliers  generally  reduce  the  price  per  pound  for 
the  various  kinds  of  raw  materials  if  an  order  is  sufficiently  large. 
Although  the  learning  curve  pertains  to  cost  reductions  as  materials 
are  applied  to  successive  lots  and  not  to  reductions  due  to  volume  pur¬ 
chases,  segregation  of  the  two  effects  is  imperfect.  This  may  account 
for  differences  observed  in  learning-curve  slopes. 

A  third  major  component  of  cost — overhead — also  declines  with  cumu¬ 
lative  output,  but  as  a  result  of  the  method  of  allocating  overhead  and 
not  because  of  a  perceptible  relationship  between  overhead  rate  and 
cumulative  output.  Direct  labor  hours  per  unit  decline  as  cumulative 
output  increases,  and  overhead  is  distributed  to  each  unit  on  the  basis 
of  direct  labor  cost  or, hours.  As  a  consequence,  it  is  inappropriate 
to  discuss  a  learning  curve  for  this  element  of  cost. 


The  Log-linear  Hypothesis 


The  relationship  between  cost  and  quantity  may  be  represented  by 
a  power  (log-linear)  equation  of  the  form 

y  ^  ax  , 

where  x  equals  the  cumulative  production  quantity.  The  relationship 
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corresponds  to  a  unit  or  a  cumulative  average  learning  curve  according 
to  whether  y  is  the  cost  of  the  xth  unit  or  the  average  cost  of  the 
first  X  units.  The  constant  a  is  the  cost  of  the  first  unit  produced. 
The  exponent  which  measures  the  slope  of  the  learning  curve,  bears 
a  simple  relationship  to  the  constant  percentage  to  which  cost  is  re¬ 
duced  as  the  quantity  is  doubled.  If  5  represents  the  fraction  to 
which  cost  decreases  when  quantity  doubles,  the  equation  becomes 

»x  aJ’  2 

This  equation  shows  that  for  a  value  of  S  equal  to  75  percent,  the  cor- 

* 

responding  value  of  b  is 


lof4  .75 
log  2 


or 


-.415. 


Log-linear  Unit  Curve 

If  a  production  process  follovjs  a  unit  learning  curve  of  the  form 
y  -  (xc  ,  the  cumulative  cost  T  of  producing  the  first  n  units  is 


T  =  a  ^  X  . 
x=l 


The  cumulative  average  cost  of  producing  the  first  n  units  is  then 


T  a  r  b 
y  =  —  =  —  )  ic  . 
°o  n  n 

x=l 


The  relationship  between  the  unit  curve  and  the  cumulative  average 

curve  is  shown  by  Fig.  2.  The  function  y  is  not  log-linear;  however, 

as  X  becomes  larger,  y  approaches  asymptotically  the  value 

G 

ifc 

In  learuing-curve  literature,  the  term  slope  often  refers  to  this 
percentage  reduction;  e.g.,  a  75-percent  slope  means  a  cur»7e  with  a  b 
value  of  -.415. 
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a  b 
Z?  +  1  ®  ^ 


which  differs  from  the  expression  for  unit  cost  only  by  the  constant 
factor  1/(Z)  +  1).  Consequently,  if  unit  cost  has  been  estimated  at  a 
sufficiently  large  quantity,  the  cumulative  average  cost  for  the  same 
quantity  may  be  approximated  by  multiplying  the  unit  measure  by 
1Kb  +  1).* 

Log-linear  Cumulative  Average  Curve 

When  a  production  process  follows  a  log-linear  cumulative  average 

curve  rather  than  a  unit  curve,  the  basic  functional  form  is  still 

y  -  ax^  but  can  be  written  y^  -  where  y^  is  the  average  cost  of 

the  first  x  units.  The  cumulative  cost  for  producing  a:  units  is  simply 

y  X,  or  ax  ,  and  the  unit  cost  is  obtained  from  the  function 
a 

-  (x  -  1)^^^]. 


The  relationship  between  a  linear  cumulative  average  curve  and  the  re¬ 
sulting  unit  curve  is  Illustrated  in  Fig.  3.  The  unit  curve  is  not 
log-linear;  however,  as  x  becomes  larger,  y^  quickly  approaches  asymp¬ 
totically  the  value 


{b  +  1)(3X  , 

which  differs  from  the  cumulative  average  cost  equation  only  by  the 
constant  factor  (b  +  1) . 

These  equations  may  appear  cumbersome,  but  in  practice  much  of 
the  work  involved  in  using  learning  curves  has  been  simplified  by  the 


Whether  a  quantity  is  sufficiently  large  for  the  asymptotic  method 
to  provide  a  good  approximation  depends  on  the  slope  of  the  learning 
curve.  For  a  90-percent  curve,  the  asymptotic  method  produces  an  error 
of  about  1  percent  at  quantity  100;  for  a  75-percent  curv."?,  the  error 
at  quantity.  100  is  a3most  5  percent  and  does  not  decrease  to  1  percent 
until  a  quantity  of  a.lmost  2000  has  been  reached. 
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Pig.  2- -Log-Zinear  unit  cujn}e  (8C-peroent  elope) 

prej.>aration  of  tables  giving  the  relationship  between  cumulative  totals 
cumulative  average,  and  unit  cost  for  a  range  of  slopes  and  quantities. 
Table  1  gives  values  for  these  relationships  for  a  70-percent  curve 
■when  a,  the  cost  of  the  first  unit  produced,  is  equal  to  1.  To  illus¬ 
trate  how  such  a  table  is  used,  assume  a  log-linear  unit  curve  and  a 
quantity  n  of  20  units.  The  total  cost  of  20  units  is  approximately 
7.4,  the  cumulative  average  cost  of  20  units  is  .37,  and  the  cost  of 
the  20th  unit  is  .214,  in  terms  of  the  cost  of  the  first  unit.  The 

unit  cost  of  .214  appears  in  the  dual-headed  column,  y  j  y  ,  since  a 

w  c 


Cost  per  unit 
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Fig.  3 — Log-linear'  cumulative  average  curve  (80-peraert  slope) 


log-linear  unit  curve  is  assumed.  If  a  log-linear  cumulative  average 
cost  curve  is  assumed,  this  column  presents  the  cumulative  average 
cost.  One  column  serves  to  present  both  log-linear  unit  and  log-linear 
cumulative  average  since  the  functional  form  of  the  equation,  y  =  ax^ , 
is  the  same  in  either  case. 

In  practice,  the  unit  cost  is  most  frequently  considered  to  be 
linear,  but  there  are  sufficient  exceptions  to  suggest  that  the  choice 
must  be  based  on  past  experience.  Once  the  choice  is  made,  however, 
it  is  of  the  utmost  importance  to  apply  the  technique  consistently. 


Cn-«'WN>l-‘  OVOOO'^ON  Ln^WrOH*  OvOOO^On  UI-OUIIOJ-*  OVOOO^O*-  Ln£>U>N3M 
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Table  1 

70-PERCENT  CURVE  DATA 


Log-linear  Unit _  Log-linear  Cumulative  Avero.ge 


emulative 

Total 

Cunulative 

Average 

Cumu¬ 
lative 
Unit  Average 

Unit 

Cumulative 

Total 

T 

yg 

yu»  yg 

yu 

yg^ 

1.000000 

1.000000 

'1.000000 

1.000000 

1.000000 

1.700000 

0.8500C0 

0.700000 

0.400000 

1.400000 

2.268180 

0.756060 

0.568180 

0.304541 

1.704541 

2.758180 

0.689545 

0.490000 

0.255459 

1.960000 

3.195027 

0.639005 

0.436846 

0.224232 

2.184232 

3.592753 

0.598792 

0.397726 

0.202125 

2.386357 

3.960150 

0.565736 

0.367397 

0.185419 

2,571777 

4.303150 

0.537894 

0.343000 

0.172223 

2.744000 

4.625979 

0.513998 

0.322829 

0.161460 

2.905460 

4.931771 

0.493177 

0.305792 

0.152465 

3.057925 

5.222928 

0.474812 

0.291157 

0.144802 

3.202727 

5.501336 

0.458445 

0.278408 

0.138173 

3.340900 

5.768511 

0.443732 

0.267174 

0.132365 

3.473266 

6.025688 

0.430406 

0.257178 

0.127222 

3.600487 

6.273896 

0.418260 

0.248208 

0.122626 

3.723113 

6.513996 

0.407125 

0.240100 

0.118487 

3.841600 

6.746721 

0,396866 

0.232726 

0.114734 

3.956334 

6.972702 

0.387372 

0.225980 

0.111310 

4.067644 

7.1924S1 

0.378552 

0.219780 

0.108171 

4.175816 

7.406536 

0.370327 

0.214055 

0.105279 

4.281095 

7.615284 

0.362633 

0.2C8748 

0.102604 

4.383699 

7.819094 

0.355413 

0.203810 

0.100119 

4.483818 

8.018295 

0.348622 

0.199201 

0.097804 

4.581622 

8.213180 

0.342216 

0.194886 

0.095639 

4.677261 

8.404015 

0.336161 

0.190835 

0.093609 

4.770870 

8.591037 

0.330425 

0.187022 

0.091702 

4.862572 

8.774462 

0.324980 

0.183425 

0.089904 

4.952476 

8.954487 

0.319803 

0.180024 

0.088206 

5.040682 

9.131290 

0.314872 

0.176803 

0.086600 

5.127282 

9.305035 

0.310168 

0.173745 

0.085076 

5.212359 

9.475873 

0.305673 

0.170838 

0.083629 

5.295988 

9.643943 

0.301373 

0.168070 

0.082252 

5.378240 

9.809373 

Oc 297254 

0.165430 

0.080940 

5.459180 

9.972281 

0.293302 

0.162908 

0.079687 

5.538867 

10.132777 

0.289508 

0.160496 

0.078490 

5.61735/ 
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As  is  evident  from  Table  1,  large  errors  could  result  if  one  type  of 
curve  was  confused  with  the  other. 


Throughout  this  section  it  will  be  assumed  that  the  log- linear 
hypothesis  applies,  i.e.,  that  the  learning  curve  is  linear  when  plot¬ 
ted  on  logarithmic  grids.  It  must  be  mentioned,  however,  that  this  is 
not  the  only  possible  formulation  of  the  learning  curve.  A  number  of 
studies  have  suggested  that  the  curve  is  not  log-linear.  One  of  the 
best  known  of  these  is  the  Stanford  Research  Institute  investigation 
of  20  World  War  II  aircraft.  The  study  proposed 

y  = 

V  X  +  B 


as  a  more  reliable  expression  of  the  relationship  between  man-hour 
cost  and  cumulative  output.  The  decision  to  find  a  substitute  function 
was  apparently  prompted  by  a  visual  inspection  of  several  series  that 
seemed  to  indicate  a  concavity  when  viewed  from  below  in  the  unit  learn- 
ing  curve.  This  concavity  has  been  recognized  independently  in  other 
studies . 

However,  in  seme  cases  both  the  labor  and  production  cost  curves 
develop  convexities  beyond  certain  values  of  cumulative  output.  In  the 
theory  of  a  linear  unit  curve,  it  is  implicitly  assumed  that  constituent 
curves  (fabrication,  subassembly,  and  major  and  final  assembly)  are  par¬ 
allel  to  the  linear  unit  curve.  Implying  that  the  rate  of  learning  on 
all  production  jobs  in  all  departments  is  the  same.  However,  it  is  to 

* 

In  this  context,  concavity  means  that  when  plotted  on  logarithmic 
grids  the  curve  declines  at  an  increasingly  steep  slope  as  it  moves 
away  from  the  y-axis.  In  the  formulation 


the  curve  becomes  essentially  linear  as  x  becomes  large  relative  to  B. 
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be  expected  that  the  departmental  learning  curves  could  have  different 
slopes  from  each  other  (e.g,,  fabrication,  80  percent;  subassembly,  75 
percent;  and  major  and  final  assembly,  70  percent).  The  sum  of  these 
curves  (the  unit  curve)  would  be  convex  when  viewed  from  below  and  ap¬ 
proach  as  a  limit  the  flattest  of  the  departmental  curves . 

Much  literature  is  available  describing  the  bases  for,  and  hypoth¬ 
eses  about,  learning  curves,  and  it  is  beyond  the  scope  of  this  section 
to  attempt  to  cover  this  background  material  in  any  detail.  For  this 
discussion,  it  is  stipulated  that  the  learning  curve  is  a  useful  and 
accepted  estima.ting  tool,  particularly  in  the  aerospace  industry,  that 
the  log-linear  curve  is  the  one  most  commonly  used,  and  that  a  knowl¬ 
edge  of  its  mechanics  is  indispensable  to  persons  making  or  using  cost 
estimates. 


Plotting  a  Curve 


In  the  graphical  display  of  learning  curves,  the  problem  is  to 
represent  the  average  cost  for  a  lot  or  a  complete  contract,  since  typi¬ 
cally,  man-hours  or  costs  are  not  recorded  by  unit.  See,  for  example, 
the  following  table: 


Lot 

Unit!', 

Manufacturing 
Hours  per  Lot 

1 

1-10 

5,830 

2 

11-20 

4,370 

3 

21-50 

10,550 

4 

51-100 

14,750 

There  is  one  subject  that  is  not  discussed  in  the  literature: 
the  effect  of  production  rate  on  unit  cost.  Economic  theory  generally 
’..olds  that  this  relationship  can  be  described  by  a  U-shaped  function; 
First,  cost  declines  as  production  rate  increases;  next,  it  is  insensi¬ 
tive  to  rate  over  some,  range;  and  eventually,  it  begins  to  rise  again. 

In  learning-curve  applications,  on  the  other  hand,  it  is  assumed  implic¬ 
itly  that  cost  is  not  affected  by  rate  of  output  (or  that  the  rate  is 
constant) .  Empirical  evidence  of  the  Interaction  between  the  volume 
and  rate  effects  is  scanty.  For  further  discussion,  see  Lee  E.  Preston 
and  E.  C.  Keachle,  "Cost  Functions  an,l  Progress  Functions:  An  Integra¬ 
tion,"  American  Economic  Revieio^  Vol.  54,  ho.  2,  Part  I,  March  1964, 
pp.  100-107. 


104 


EQUIPMENT  COST  ESTHUTING 


To  plot  a  cumulative  average  curve  from  these  data,  the  cumulative 
average  hoars  are  computed  at  the  final  unit  in  each  lot: 


Plot  Point 

Manufaotfo'ing 
Hours  per  Lot 

Computation 

Cumulative 
Average  Hours 

10 

5,830 

5,830  V  10 

583 

20 

4,370 

10,200  T  20 

510 

50 

10,550 

20,750  V  50 

415 

100 

14,750 

35,500  T  ,100 

355 

The  cumulative  average  at  the  10th  unit  is  583  hours;  this  is  the  first 
plot  point.  Successive  plot  points  are  at  the  end  of  each  lot,  since 
these  are  the  points  where  the  cumulative  average  hour  figures  apply. 

To  plot  the  unit  curve  it  is  first  necessary  to  compute  the  unit 
hours  and  then  to  establish  plot  points.  The  unit  hours  can  be  taken 
as  an  average  for  each  lot; 


Unit 

Lot  Computation  Hours 

1  5,830  V  10  583 

2  4,370  V  10  437 

3  10,550  T  30  352 

4  14,750  T  50  295 


The  lots  can  be  represented  by  these  unit  hour  values.  The  question, 
is,  where  should  the  values  be  plotted?  To  plot  at  the  lot  arithmetic 
midpoint  is  to  assume  that  the  learning  curve  can  be  approximated  by  a 
linear  curve  on  arithmetic  grids,  but  as  suggested  by  Fig.  1  such  a 
method  of  approximation  only  becomes  reasonable  for  lots  following  a 
large  numlier  of  previous  units.  Thus,  when  dealing  with  a  log-linear 
function,  the  arithmetic  midpoint  plot  produces  the  unequal  distribu¬ 
tion  of  the  area  under  the  curve,  as  shown  in  Fig.  4. 

The  true  midpoint  is  def.^ned  as  that  unit,  which  represents 
the  entire  lot  and  which  must  also  reflect  the  average  unit  cost,  y^, 
of  the  lot.  The  total  cost  (or  total  hours)  of  the  lot  is  equal  to 
the  product  of  y^  and  the  number  of  '.^its  in  the  lot,  n.  This  product 
will  approximate  the  area  under  the  curve  for  n  units  (see  Fig.  5). 


*■ 

If  n  rciJresents  only  integers,  the  limits  of  the  area  must  be 
modified.  (Sea  B  Asher,  Cost- quantity  Relationships  in  the  Airframe 
Industry^  The  Rand  Corporation,  R-291,  July  1,  1956,  pp,  34-38.) 
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Fig,  4 — Learning  curve  on  arithmetic  grids 


Fig.  5 — True  lot  midpoint  on  crithmetio  grids 


Note  that  if  the  area  under  the  curve  is  equal  to  the  two  cross- 

hatched  areas  in  the 'figure  must  be  equal.  In  fact,  the  exact  deter¬ 
mination  of  a  true  lot  plot  point  for  plotting  purposes  depends  on  (1) 
the  lot  quantity;  (2)  the  type  of  curve  hypothesized,  i.e.,  whether 
the  unit  curve  or  thfe  cumulative  average  cuirve  :ls  log-linear;  and  (3) 
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the  true  value  of  the  slope.  Therefore,  these  values  must  be  known  or 
assumed.  The  first,  the  lot  quantity,  will  be  known.  The  second,  the 
nature  of  the  curve,  must  be  assumed.  The  third,  the  true  value  of  the 
slope.  Is  actually  never  known,  and  is  usually  approximated  based  on 
prior  experience. 

It  is  possible  to  ascertain  the  exact  lot  plot  points  for  each  type 
of  curve  over  a  range  of  slopes  and  quantities.  However,  because  of 
the  assumptions  mentioned  above  that  will  usually  have  to  be  made  re¬ 
garding  both  the  type  of  curve  and  its  approximate  slope,  in  most  sit¬ 
uations  there  Is  little  need  to  strive  for  extreme  accuracy.  The  fol¬ 
lowing  discussion  provides  methods  of  approximation  that  do  not  involve 
the  complicated  calculations  required  to  derive  the  true  lot  plot  point. 

As  illustrated  in  Fig.  5,  is  the  average  cost  for  the  lot  as 
well  as  the  unit  cost  of  the  lot  plot  point  .  Therefore,  tables  sim¬ 
ilar  to  Table  1  can  be  used  to  derive  acceptably  accurate  plot  points. 

To  illustrate,  assume  a  log-linear  unit  curve  of  70  percent,  a  first 
lot  of  10  units,  and  a  first  unit  cost  of  1.  Then,  the  cumulative  av¬ 
erage  cost  y^  of  the  first  10  units  is  .493.  This  average  cost  lies 
between  unit  cost  values  y^  of  .568  and  .490,  i.e.,  between  units  3  and 
4  on  the  unit  curve.  Arithmetic  interpolation  yields  a  value  for 
of  slightly  less  than  4,  which  is  the  plot  point  for  this  particular 
lot  when  a  70-percent  log-linear  unit  curve  is  assumed.  An  exact  solu¬ 
tion  to  the  plot  point  equation  would  show  the  true  plot  point  for  a 
70-percent  curve  to  be  3.95.  Similarly,  if  the  first  unit  cost  is  1 
and  if  a  70-percent  log-linear  cumulative  average  cuirve  is  assumed, 
data  from  Table  1  yield  a  plot-point  approximation  of  slightly  less 
than  3  (the  cumulative  average  cost  for  10  units  is  .306,  which  lies 
between  unit  cost  values  of  .400  and  .304,  i.e.,  between  units  2  and  3 
on  the  unit  curve);  the  true  plot  point  is  2.98.  In  this  example,  the 
plot  points  vary  because  of  the  assumption  that  one  or  the  other  of  the 
curves  is  log-linear .  This  method  of  approximation  produces  accurate 
first-lot  plot  points  for  all  but  very  small  lot  sizes.  As  a  general 
rule,  the  steeper  the  slope  and  the  smaller  the  lot  size,  the  less 
accurate  this  approximation  method  becomes. 

For  the  successive  lots  following  a  preceding  quantity,  the  same 
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procedure  can  be  used  for  approximating  plot  points,  To  illustrate, 
again  using  Table  1,  assume  that  a  quantity  of  10  units  follows  the 
first  lot  of  10  units.  If  a  70-percent  log-linear  vnit  curve  and  a 
unit  cost  of  .1  are  assumed,  the  total  cost  of  the  second  lot  may  be  ob¬ 
tained  by  subtracting  4.93  (the  total  cost  of  the  first  10  units)  from 

7.4  (the  total  cost  of  20  units),  or  a  difference  of  2.47.  This  repre¬ 
sents  an  average  cost  of  .247  for  the  10  items  in  the  lot.  This  value 

falls  between  units  15  and  16  on  the  unit  curve,  and  simple  interpola¬ 
tion  gives  a  value  of  15.1  for  the  plot  point.  If  a  log-linear  oifftiu- 
lative  average  curve  is  assumed,  the  approximation  value  of  the  plot 
point  is  also  15.1.  In  other  words,  from  Table  1,  the  difference  be¬ 
tween  the  cumulative  total  for  20  and  10  units,  4.28  and  3.06,  respec¬ 
tively,  is  1.22,  or  an  average  of  .122  for  the  10  units  in  the  lot. 

This  unit  cost  lies  between  .1226  and  .1185  or  units  15  and  16  on  the 
unit  curve. 

Tables  to  permit  computation  of  lot  plot  points  for  a  range  of 
slopes  and  lot  quantities  are  available  in  the  literature.  In  addi¬ 
tion,  an  easier-to-use,  but  less  accurate,  approximation  method  will  be 
discussed  that  provides  plot  points  for  early  lot  quantities  of  less 
than  100 . 

Figure  6  presents  an  approximation  of  the  plot  point  for  the  first 
lot.  It  illustrates  that  substantial  errors  are  possible  vjhen  deriving 
first-lpt  plot  points.  The  abscissa  represents  first-lot  quantity  and 
the  ordinate  the  first-lot  pxot  points  associated  with  each  quantity. 
For  the  upper  dashed  curve,  a  95-percent  log-linear  unit  curve  is  as¬ 
sumed;  for  the  upper  solid  line,  a  95-percent  log-linear  cumulative 
average  curve  is  assumed.  Similarly,  for  the  lower  lines,  65-percent 
curves  are  assumed.  Approximation  methods  suitable  for  one  type  of 
curve  cannot  be  used  for  another  type  unless  extremely  large  quantities 
are  dealt  with,  i.e.,  well  beyond  those  shown  in  the  'figure.  Figure  6 
also  shows  the  greater  sensitivity  to  slope  exhibited  by  the  log-linear 
cumulative  average  curve  for  moderately  small  first  lots. 

^  • 

See,  for  example,  H.  E.  Boren  and  II.  G.  Campbell,  LearAing  Curve 

Tables,  Vols.  1-3,  The  Rand  Corporation,  SM-6191-PR,  to  be  issued. 
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Fig.  6— First-lot  midpoints  versus  first-lot  quantities 

In  addition,  it  affords  an  opportunity  to  -approximate  quickly  th-a  range 
of  error  that  can  be  introduced  by  inappropriate  plotting  of  the  cost 
of  the  first  lot. 

Figure  7  gives  plot  points  fee  follow-on  lots.  These  points  rep¬ 
resent  an  average  of  the  range  obtained  from  65-  to  95-percent  curves 
and  the  range  obtained  from  a  log-linear  unit  or  a  log-linear  cumulative 
average  cuirve.  The  graph  is  used  as  follows: 

1.  The  first  unit  of  the  cortract  lot  is  found  on  the  45-deg 


2.  The  curve  extending  out  from  this  unit  is  followed  to  the 

point  on  the  horizontal  axis  that  represents  the  last  unit  of 
the  lot. 


( 
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3.  The  plot  point  is  rftad  off  the  vertical  axis  at  that  point. 
Thus,  for  a  lot  of  10  units  following  10  previous  units,  the 
plot  point  would  be  slightly  over  15. 

In  practice,  plot  points  for  only  the  first  two  or  three  lots,  if  these 
comprise  more  than  about  25  units,  need  be  taken  from  the  graph.  For 
succeeding  lots,  the  arithmetic  lot  midpoint  is  usually  adequate. 

As  a  further  illuscration.  Fig.  8  shows  two  sets  of  curves.  The 
lower  set  of  curves  was  constructed  from  a  series  of  small  contract 
lots,  10,  29,  and  31  units.  The  upper  set  of  curves  was  based  on  tv/o 
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large  contract  lots,  100  and  500  units.  With  lot  average  costs,  the 
costs  were  plotted  (.1)  at  lot  quantity  arithnietic  midpoints,  (2)  at 
plot  points  where  a  log-linear  unit  curve  for  65-  and  95-percant  slopes 
was  assumed,  and  (3)  at  plot  points  where  a  log-linear  cumulative  aver¬ 
age  curve  for  65-  and  95-percent  slopes  was  assumed- 

From  Fig.  8  it  can  be  seen  that  the  distance  between  the  unit  curve 
constructed  with  the  arithmetic  midpoint  and  the  unit  curve  constructed 
with  the  true  plot  points  depends  on  the  size  of  the  lot  quantity.  The 
larger  the  lot  quantity,  the  greater  the  distance  betV7een  the  midpoint 
line  and  the  other  lines.  In  both  sets  the  unit  curves  exhibit  the 
widest  variation  for  the  first  lot.  However,  for  a  series  of  small  con¬ 
tract  lots  the  range  of  plot  points  is  of  interest  only  for  the  first 
few  lots.  The  midpoint  of  even  the  second-lot  quantity  may  often  pro¬ 
vide  a  good  approximation  of  the  unit  curve. 

It  is  not  the  purpose  of  this  discussion  to  recommend  any  partic¬ 
ular  technique.  Rather,  it  is  to  underline  that  plotting  representative 
unit  costs  for  contract  lets  is  of  importance.  The  gross  misplacement 
of  early  points  could  lead  to  improper  conclusions  about  cost-quantity 
relationships . 


Variations 


The  examples  used  earlier  tend  to  suggest  that  data  points  gener¬ 
ally  fall  along  a  straight  line,  as  one  would  expect  from  the  log-linear 
hypothesis.  The  truth  is  that  plots  of  the  type  illustrated  in  Fig.  9 
are  not  unusual  and  that  fitting  a  curve  to  these  points  is  more  than 
a  matter  of  understanding  the  least  -squares  method  of  curve  fitting. 

The  types  of  plots  in  Fig.  9  are  common  enough  to  have  been  given  names 
by  the  airframe  industry.  The  "scallop"  is  generally  caused  by  a  model 
change  or  some  other  major  interruption  in  the  production  process. 
Characteristic  of  a  scallop  is  the  abrupt  rise  in  manufacturing  hours, 
followed  by  a  rapid  decline,  the  basic  slope  of  the  curve  remaining 
relatively  unchanged.  When  a  model  change  is  sufficiently  groat,  as  in 
the  case  of  the  change  to  the  F-106B  from  the  F-106A,  the  result  is  not 
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Fig.  9-^Illiistvative  exmples  of  learning-aurve  slopes 

a  scallop  but  a  change  to  a  new  curve.  In  this  case,  a  "level-off"  or 
"follow-on"  is  characteristic  of  the  .initial  portion  of  the  new  curve. 
This  is  attributed  to  learning  from  a  previous  model  that  carries  over 
and  flattens  the  curve  during  initial  production.  Such  an  effect  can 
also  occur  when  production  is  halted  for  a  long  period  or  when  produc¬ 
tion  is  transferred  to  a  new  facility. 

To  "bottom-out"  is  the  tendency  for  a  learning  curve  to  flatten 
at  high  production  quantities.  It  seems  reasonable  that  at  some  point 
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no  further  learning  should  occur  or  that  whatever  slight  learning  does 
occur  would  be  offset  by  the  effect  of  other  factors.  In  addition,  it 
can  be  established  empirically  that  bottoming-out  has  occurred  in  a 
number  of  cases.  There  are  those  who  argue,  however,  that  learning 
oan  continue  indefinitely,  or  at  least  as  long  as  the  attempt  is  made 
to  obtain  man-hour  reductions.  The  classic  case  relates  to  the  assem¬ 
bly  of  candy  boxes,  in  which  operation  the  learning  curve  was  found  to 
have  continued  for  the  preceding  '16  years  when  16  million  boxes  v;ere 

•k 

assembled  by  one  person.  The  problem  for  the  estimator,  of  course,  is 
that  while  bottoming-out  may  occur  in  any  given  case,  it  is  difficult 
to  predict  where  it  will  occur.  One  study  found  that  for  the  sample  of 
airframes  examined  it  was  fairly  typical  for  flattening  to  begin  at  the 
300th  unit,  but  in  the  past  this  has  not  been  true  for  many  airframes. 
The  B-17  curve  maintained  a  70-percent  slope  out  to  the  6000th  unit  and 
then  exhibited  a  toe-up. 

"Toe-ups"  and  "toe-downs"  are  the  names  given  to  the  rather  sharp 
rises  or  falls  in  hours  that  sometimes  occur  at  the  end  of  a  production 
series.  The  upward  trend  has  been  explained  as  resulting  from  the 
transfer  of  experienced  workers  to  other  production  lines,  an  increase 
in  the  amount  of  handwork  as  machines  are  disassembled,  failure  to.;re- 
place  or  repair  worn  tooling  at  the  normal  rate,  tool  disassembly,  or 
a  production  lag  at  the  end  of  a  program  to  forestall  unemployment. 
Toe-downs  are  thought  to  be  caused  by  fewer  engineering  changes  at  the 
end  of  a  production  run  and  also  by  the  ability  of  the  manufacturer  to 
salvage  certain  items  fabricated  in  previous  lots. 

It  is  Important  to  realize  that  such  variations  in  production  do 
occur,  and  not  occasionally  but  frequently.  In  the  analysis  of  man¬ 
hour  or  cost  data,  use  of  the  unit  curve  reveals  these  variations,  and 

k 

Glen  E.  Ghormley,  "The  Learning  Curve,"  Was  tern  Industry  (now 
Western  Manufacturing) ,  September  1952,  pp.  31-34. 

Planning  Research  Corporation,  Methods  of  Estimating  Fixed-wing 

Airframe  Costs j  Vol.  I  (Revised),  PRC  R-547A,  Los  Angeles,  April  1967. 

*** 

Glenn  M.  Brewer,  The  Learning  Curve  in  the  Airfrane  Industry^ 
School  of  Systems  &  Logistics,  Air  Force  Institute  of  Technology,  Re¬ 
port  SLSR-18-65,  August  1965. 
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for  this  reason  the  unit  curve  is  generally  preferred.  The  cumulative 
average  curve  tends  to  smooth 'out  aberrations  to  such  an  extent  that 
even  major  changes  can  be  obscured,  as  shown  in  Fig.  10.  The  data 
points  are  taken  from  a  fighter  aircraft  production  program  that  had 
more  than  its  share  of  problems.  The  solid  line  shows  how  a  cumulative 
average  curve  dampens  the  effect  of  these  problems.  The  choice  between 
working  with  the  unit  or  the  cumulative  average  curve  depends  on  the 
problem.  The  unit  curve  better  describes  the  data  and  is  therefore 
preferred.  In  addition,  its  use  can  aid  the  cost  analyst  in  determining 
whether  the  basic  curve  is  best  represented  by  a  log-linear  cumulative 
average  or  unit  function,  what  slope  is  most  appropriate,  and  what  fol¬ 
low-on  projections  can  be  made.  The  log-linear  cumulative  average 
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Fig.  10 — Smoothing  effect  of  emulative  average  curve 
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curve  is  widely  preferred  in  predictive  models  because  of  its  computa¬ 
tional  simplicity,  i.e.,  the  cost  of  n  items  is  simply  the  cumulative 
average  cost  of  the  nth  item  times  n.  However,  it  is  important  to 
understand  all  curves  well  enough  to  choose  intelligently  between  them. 


Applications 

Tlie  learning  curve  is  used  for  a  variety  of  purposes  and  in  a  vsx- 
iety  of  contexts;  how  the  curve  is  drawn  will  depend  on  the  purpose 
and  the  context.  In  long-range  planning  studies,  for  example,  the  curve 
must  be  constructed  on  the  basis  of  generalized  historical  data,  and  the 
possible  error  is  considerable.  Empirical  evidence  does  not  support 
the  concept  of  a  single  slope  for  all  fighter  aircraft,  all  solid  pro¬ 
pellant  missiles,  or  all  spacecraft.  Therefore,  the  practice  of  assum¬ 
ing  that  manufacturing  hours  on  the  airframe  will  follow  an  80-percent 
curve  (as  was  common  for  many  years)  or  that  electronic  equipment  will 
follow,  say,  a  90-percent  curve,  can  lead  to  very  large  estimating 
errors. 

In  regard  to  airframes,  Table  2  shows  the  slope  of  the  manufactur¬ 
ing-hour  curves  for  25  post  World  War  II  Air  Force  and  Navy  aircraft 
and  indicates  that  a  slope  steeper  than  80  percent  is  the  rule.  Since 
the  learning-curve  slopes  of  the  table  show  important  differences,  it 
would  be  desirable  to  relate  slope  to  aircraft  characteristics.  Such 

a  relation  is  accomplished  by  a  technique  suggested  by  the  Planning 

* 

Research  Corporation.  Separate  estimating  equations  based  on  aircraft 
characteristics  are  derived  for  four  different  production  quantities — 
10,  30,  100,  and  300 — and  a  learning  curve  is  developed  from  the  esti¬ 
mates  at  these  four  points.  However,  on  a  theoretical  level  the  con¬ 
cern  is  with  aircraft  characteristics  that  influence  the  rate  of  learn¬ 
ing.  It  seems  reasonable  to  expect  relatively  little  learning  for  a 
model  that  represents  a  small  modification  over  a  preceding  type,  be¬ 
cause  the  previous  model  would  have  already  absorbed  a  considerable 
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Table  2 

LSARNING  CURVES  FOR  MANUFACTURING 
(Labor  for  Airframe  Only) 

Learning  Curve 

Aircraft  (%> 

Fighter . 77 

Fighter .  73 

Fighter .  74 

Fighter .  73 

Fighter...... .  78 

Fighter .  71 

Fighter .  74 

Fighter.. .  76 

Fighter .  77 

Fighter .  79 

Fighter .  82 

Fighter . . .  76 

Fighter . . .  73 

Fighter .  74 

Bomber . . .  76 

Bomber .  73 

Bomber .  70 

Bomber . 71 

Bomber .  79 

Cargo . . 74 

Cargo . . .  78 

Cargo .  77 

Cargo .  75 

Trainer .  74 

Trainer .  75 

Mean .  75 

Standard  Deviation .  2.7 


SOURCE:  G.  S.  Levenson  and  S.  K.  Barro, 
Cost-estimating  Relationships  for  Aircraft 
Airframes i  The  Rand  Corporation,  RM-4845-PR 
(Abridged),  May  1966,  p.  56. 
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learning  effect.  On  the  other  hand,  if  an  aircraft  contains  radically 
new  design  features,  a  high  initial  cost  is  to  be  expected,  followed 
by  a  rapid  decline  with  increased  production  quantities.  In  other 
words,  it  has  been  suggested  that  the  "newness"  of  an  aircraft  should 
be  a  major  determinant  of  learning-curve  slope,  but  explicit  techniques 
for  taking  newness  into  account  have  yet  to  be  developed. 

For  estimating  to  be  effective,  therefore,  learning  curvco  must  be 
established  on  the  basis  of  historical  data  relevant  to  the  specific 
problem.  Such  curves  are  equally  applicable  to  missiles,  electronic 
equipment,  aircraft,  ships,  and  other  types  of  equipment,  but  the  slopes 
may  be  different  for  each  of  these.  A  recent  study  of  avionics,  for 
example,  showed  slopes  ranging  from  84  to  91  percent  with  a  median 
value  of  88  percent.  If  a  comparison  is  being  made  between  two  weapon 
systems,  one  involving  aircraft  and  the  other  missiles,  the  learning- 
curve  slope  chosen  for  each  could  play  a  significant  part  in  the  total 
system  cost  comparison.  For  example,  the  effect  of  using  a  92-percent 
rather  than  a  90-percent  cumulative  average  curve  is  an  increase  of  25 
percent  in  the  total  cost  of  1500  items.  As  one  would  guess,  the  sit¬ 
uation  is  worse  when  steeper  slopes  are  involved.  If  a  slope  of  62  per¬ 
cent  instead  of  60  percent  is  assumed,  there  is  a  42-percent  difference 

in  the  cost  of  1500  items  and  a  25-percent  difference  in  the  cost  of  100 

* 

items.  In  practice,  errors  of  this  type  can  be  minimized  by  origina¬ 
ting  the  curve  at  the  estimated  cost  of  the  100th  unit  rather  than  at 
the  first.  Table  3  shows  how  this  reduces  the  effect  of  a  2-percent 
change  in  slope  on  total  cost. 

Once  a  few  data  points  are  available  either  for  developmental  or 
production  items,  the  situation  should  Improve,  but,  as  illustrated  by 
Fig.  11,  the  first  few  points  may  be  misleading.  Suppose  an  estimator 
had  been  asked  to  calculate  the  cost  of  a  large  production  contract 
after  the  fabrication  of  the  first  30  units.  By  fitting  a  curve  to  the 
existing  data  he  would  have  projected  a  learning  curve  with  an  88-  or 
89-percent  slope  and  at  a  level  considerably  higher  than  that  later 

* 

The  assumption  regarding  the  type  of  curve  is  important.  For 
example,  if  a  log-linear  unit  curve  (rather  than  a  log-linear  cumula¬ 
tive  average  curve)  were  assumed,  these  differences  would  be  only  25 
and  13  percent,  respectively. 
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Table  3 

EFFECT  OF  VARYING  SLOPE  ASSUMPTIONS 

Change  in  Total- 
Cost  of  1600 


Change  in  Slope  Units  (%J 

From  90%  to  92% 

Origin  of  curve: 

Unit  1  . .  25 

Unit  100  .  9® 

From  60%  to  62% 

Origin  of  curve: 

Unit  !  .  42 

Unit  100  .  14 


If  a  log-linear  unit  curve  is  assumed, 
this  value  would  be  less  than  6  percent. 


experienced.  In  this  situation  it  is  important  to  realize  that  such  a 
flat  learning  curve  for  airframe  production  is  improbable.  The  estima¬ 
tor  should  have  an  idea  of  what  the  answer  is  likely  to  be  and  should 
Investigate  differences. 


Cumulative  units 


Fig.  11 — Direct  Icibor  hcuTs  for  a  transport  aircraft 
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With  a  small  sample  of  data,  where  a  learning  curve  isi  fittea  to 
a  few  points,  the  correlation  may  be  perfect,  i.e.,  all  the  points  may 
lie  on  the  fitted  line,  but  the  results  can  still  be  unreliable.  The 
points  vjscd  in  fitting  ’  “jt  be  sufficiently  numerous  and  reasonably 
homogenuf'us  with  the  points  implied  by  extending  the  curve  to  offer  a 
reasonable  probability  of  success  in  predicting  costs. 

The  manufacturing  history  of  the  item  to  be  fabricated  is  the  most 
valuable  Information  the  estimator  can  have.  Variations  from  the  norm 
may  be  caused  by  particular  problems,  configuration  changes,  or  changes 
in  manufacturing  methods.  In  the  curve  of  Fig.  11,  the  initially  flat 
portion  (out  to  the  30th  airframe)  is  explained  by  the  manufacturer  as 
being  typical  of  the  initial  production  period.  In  this  manufacturer's 
experience,  the  curve  t  igins  to  steepen  when 

1.  Manpower  has  stabilized  or  reached  its  peak. 

2.  The  englneericg  configuration  has  stabilized. 

3.  The  parts  flow  has  stabilized. 

Thus,  it  may  be  preferable  to  explain  certain  points  and  exclude  them 
rather  than  to  include  them  and  bias  the  curve  in  height  or  slope. 

Whether  to  include  all  the  points  depends,  in  addition,  on  the 
anticipated  use  of  the  resulting  curve.  If  a  unit  cost  curve  that  in¬ 
cludes  all  costs  and  changes  is  desired,  a  line  of  best  fit  through  the 
unit  plot  points  may  be  appropriate.  If  the  curve  is  to  be  used  in 
negotiating  a  follow-on  contract,  the  effect  of  changes  should  be  elim¬ 
inated  by  constructing  a  curve  through  the  lower  portion  of  the  plotted 
individual  unit  points,  as  in  Fig.  12.  In  effect,  this  assumes  that 
the  introduction  of  changes  raises  the  hours  initially  but  that  these 
decrease  again  to  the  approximate  level  of  the  original  curve. 

Whatever  the  basic  technique,  it  is  important  to  remember  that  on 
logarithmic  grids  the  points  at  the  right  are  usually  more  Important 
than  those  at  the  left.  In  visually  fitting  a  line,  the  analyst  should 
avoid  the  tendency  to  be  unduly  influenced  by  plot  points  for  small 
early  lots.  Early  units  are  often  incomplete  because  they  are  used  for 

* 

It  is  also  possible  to  have  a  segmented  unit  curve,  as  implied 
by  Fig.  11,  and  several  manufacturers  subscribe  to  this  concept. 
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Fig,  lE-^'Effeot  of  dhangeB  on  the  learning  curve 


test  purposes.  It  is  equally  possible  that  early  units  will  include 
certain  nonrecurring  problems  incident  to  startup  and  for  this  reason 
may  be  above  the  level  suggested  by  later  plot  points. 

Of  course,  variations  in  unit  cost  (or  hour)  data  may  happen  for 
reasons  other  than  the  introduction  of  changes.  An  interruption  in 
production  can  be  an  important  factor.  Interruptions  may  occur  because 
of  production  cutbacks,  labor  disturbances,  or  funding  problems.  What¬ 
ever  the  reason,  if  significant  time  periods  are  involved,  the  learn¬ 
ing  curve  will  be  affected  in  much  the  same  way  as  Illustrated  in  Fig. 
12.  Those  units  produced  after  a  significant  amount  of  interruption 
can  be  expected  to  exhibit  sharp  Increases  in  costs,  followed  by  a  re¬ 
covery  to  the  approximate  projected  level  of  the  earlier  preinterrup¬ 
tion  period. 
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